Iconic 12

DRDO Sponsored International Conference On
Intelligence Computing (ICONIC'12)

3 & 4 May, 2012
PROCEEDINGS
Editors Mr. K. Karnavel Mrs. S. Roselin Mary Prof. J.R. Balakrishnan
Department of Computer Science and Engineering, Anand Institute of Higher Technology

IT Corridor, Kalasalingam Nagar, OMR Raod, Kazhipattur, Chennai-603103.
Organized by,
Email: iconic.aiht@gmail.com Web Site: http://www.iconic12.in http://www.aiht.ac.in
Edited By
Mr. K. Karnavel Mrs. S. Roselin Mary Prof. J.R. Balakrishnan May 2012 Sri Maruthi Publishers 978-93-80757-90-2
Edition
Published by: ISBN :
All rights for this book reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without the prior permission of the copyright owner.
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12)
Organized by: Department of Computer Science and Engineering, Anand Institute of Higher Technology, Chennai. www.iconic12.in E-mail: iconic.aiht@gmail.com
ANAND INSTITUTE OF HIGHER TECHNOLOGY

(Approved by AICTE, NBA Accredited and Affiliated to Anna University)
Kalasalingam Nagar, old Mahabalipuram Road, Kazhipattur, Chennai-603103 Phone: 27471310/27471320

Website: http://www.aiht.ac.in
Message
I am very much happy to note that the Department of Computer Science and Engineering is organizing an International Conference on Intelligence Computing on 03 rd & 04th May 2012. I hope that this conference will be of great useful to the participants to interact and discuss about the recent advances in intelligence computing and related areas. I wish that the Department of Computer Science and Engineering should continue its efforts to make further significant achievements in the days to come. I congratulate the Convener and Organizing Secretary and sincerely appreciate each and every one of the participants of this conference.
Kalvivallal Shri. T. Kalasalingam Founder and Chairman


Message
I am happy to know that Department of Computer Science and Engineering is organizing an International Conference on Intelligence Computing ICONIC12 on 03 rd & 04th May 2012. The objective of this conference is to bring the post-graduate students, research scholars, faculty members, software developers and Industrialists to discuss about the recent trends and advances in intelligence computing. This convention is of great importance and to create an excellent opportunity for the wide variety of participants to discuss on various topics on Wireless Networks, Databases and Intelligent Systems, Cloud Computing, Nano Computing, Quantum Computing and the like areas. However the periodical conventions and the academic organizational events may brings laurels to the individuals and to the institutions. I wish the conference a great success. I congratulate the Organizers of the Conference for their sincere efforts for conducting this programme successfully.
Ilayavallal Shri. K. Sridharan Secretary


Message
I am very much delighted to know that the Department of Computer Science and Engineering is organizing an International Conference on Intelligence Computing ICONIC12 on 03rd & 04th May 2012. It is a golden opportunity to the researchers to meet together to discuss above the recent advances and concepts that are emerging in the field of automation. The conference will bring a new path towards attaining the best methods for the development of software systems economically and will be running on any real machines. I wish all the participants a very useful association. I also congratulate all the Organizers of the conference for its success.
Dr. Arivalagi Sridharan Director (A&A)


Message
I am extremely happy to know that the Department of Computer Science and Engineering is organizing an International Conference on Intelligence Computing ICONIC12 on 03 rd & 04th May 2012. I am sure that this convention may provide lots of opportunities to the researchers of Computer Science to interact with each other and benefit mutually. A reasonable motivation through the various events of the conference may certainly enrich the skills of the researchers. I congratulate and appreciate sincerely for all the efforts taken by the Convener and the Organizing Secretary of this conference for a great success.
Dr. T.A. Raghavendiran PRINCIPAL


Message
I am very much pleased to invite all the participants for the International Conference on Intelligence Computing on 03rd & 04th May 2012. The aim of this conference is to bring the researchers for an interaction with fellow research groups in various emerging field of intelligence computing. The society has become more dependent on the use of computing systems and their usage in day today life. I am glad to note that this conference will have deliberation in the state of art technologies and other related topics. I am sure that this conference will be of great importance to the research scholars, faculty members, software developers and Industrialists. I wish all the delegates a very useful association and a memorable stay in our college. I congratulate the Convener and Organizing Secretary and wish them for the great success of this conference.
Prof. J.R.BALAKRISHNAN Director (CSE/IT/MCA)


Message
With the blessings of the holy Lord, the Department of Computer Science and Engineering have reached the milestone of the First International Conference on Intelligence Computing (ICONIC12). As Convener of the conference I would like to thank the Defence Research & Development Organisation (DRDO) Ministry of Defence, Government of India for the encouragement and financial support to conduct this conference. I want to thank our higher authorities for their valuable suggestions, guidance and support. I want to thank all the track chairs, authors and reviewers who have made this conference a reality and a success. I congratulate the staffs of various committees for their valuable cooperation and commitment. They have worked hard for over the months and they need to be complimented for their efforts. Most of all, Mr.Karnavel, Organizing secretary and the team deserve our atmost appreciation. They worked tirelessly over uncounted number of hours and gave it their all. They assisted in preparing the program, the proceedings, and with many details in organizing a conference. I hope that this conference will be valuable and researchable for the students and research scholars. I greatly appreciate the interest shown by authors by sending more than 80 papers from the various parts of the world. It will remain a vibrant, participative and contributing well into the future. Enjoy this Conference! MRS.S.ROSELINE MARY HOD/CSE CONVENOR ICONIC2012


Message
Keeping all the development in mind and to celebrate the international year of Computer Science and Engineering, We are organizing a two day International Conference on Intelligence Computing ICONIC-12. There is overwhelming response from the researches in and out of India. The focus is on the recent research trends on Intelligence Computing. I am greatly indebted universities, colleges of India and abroad for their keen interest in this conference. The proceedings and souvenir which are planned to be released on the occasion of the conference will, no doubt, from the bases for future innovations. My sincere thanks to the Patrons and members of International and National Advisory committee and technical committee for their valuable guidance, constant support and encouragement. I thank from the bottom of my heart, the support and the encouragement given by the college management, Our Honorable chairman Kalvivallal T. Kalasalingam and our beloved secretary Ilayavallal Shri. Sridharan, Director Dr. Mrs. Arivazhagi, Principal Dr. T. A. Raghavendiran , Director(CSE/IT/MCA) Prof. J. R. Balakrishnan, HOD(CSE) Mrs.S. Roseline Mary for giving us this great opportunity to conduct this International Conference ICONIC-12. My sincere thanks to all the speakers and delegates for their participation and the members of the local organizing committee members, my department family members, technical staff members, student volunteers and all others who have made ICONIC-12 as viable event. MR.K.KARNAVEL ORGANIZING SECERTERY ICONIC 2012
COLLEGE PROFILE Anand Institute of Higher Technology was established in the year 2000 by the indefatigable and conscious efforts of our founder Chairman Kalvi Vallal Thiru. T. Kalasalingam B.Com., He is a true Gandhian, an ideal educationist and a Philanthropist. He devotes his full time for the upliftment of backward classes in particular by promoting higher educational institution. Anand Institute of Higher Technology was started in the rural area near Chennai city. It is one of the many professional institutions run by Kalasalingam and Anandam Ammal Charities with the background success of running professional institutions for more than 25 years.
The Courses Offered in our college are: Under Graduate Level B.E. - Electronics and Communication Engineering B.E.- Computer Science and Engineering B.E.- Electrical and Electronics Engineering B.E.- Electronics and Instrumentation Engineering B.E.- Mechanical Engineering B.Tech. - Information Technology Memorandum of Understanding: The Institution has signed MOUs with many Companies. Lab View academy has been established with support from National instruments, USA. National Instruments transforms the way the budding engineers design, prototype, and deploy systems for test, control embedded
Post Graduate Level M.C.A. - Master of Computer Application M.B.A. - Master of Business Administration M.E.- Communication System M.E.- Computer Science and Engineering M.E.- Power Electronics and Drives M.E.- Mechanical Engineering
design applications. AIHT has signed a MOU with Soongsil University , Seoul, South Korea. The areas of cooperation are as follows Exchange of Faculty and Students R & D Activities. Conducting collaborative research projects Conducting lectures and organizing symposim Exchange of academic information and materials Promoting other academic cooperation as mutually agreed. PROFILE OF THE DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING (Accredited by NBA-AICTE) MISSION OF THE DEPARTMENT The primary goal of the department is to prepare emerging computer engineers with concept oriented subject knowledge supplemented with practical orientation to face the challenges of modern computing industry The department of CSE has been functioning since the establishment of the college in the year 2000. A separate block has been built to cater to the needs of the students and to facilitate them with adequate infrastructure. The department offers an under graduate programme which is accredited by NBA and post graduate programme in Computer science and Engineering. The department has well qualified, experienced and dedicated faculty members motivating the students to develop themselves in the field of engineering and research. The department is well equipped with various laboratories to meet the requirement of university curriculum. The curriculum includes detailed in-depth knowledge of both hardware and software concepts from the fundamentals to the latest. The department to its credit has four state-of-art laboratories with over 200 computers that occupy an area of 5400 sq.ft. The computing center located in the academic block is supported with special workstation running on Unix/Linux environment, dedicated workstation for multimedia applications, all connected by a network that brings over 150 computers together. Our department is providing high-end software training for the students within our campus, in collaboration with leading software-training institutions. We have internet facility with Wi-fi connectivity which has a speed of 2mbps. The Department conducts national level conference every year to provide an ample opportunity for PG Students and Research Scholars to share their knowledge in respective domain of interest. The Department has very good placement record
and the students got placed in leading companies like TCS, WIPRO, CTS and INFOSYS. The Industry institution interaction cell periodically organizes industrial visits for every semester. COURSES OFFERED: B.E.- Computer Science and Engineering M.E.- Computer Science and Engineering
SAILENT FEATURES
Qualified and Experienced team of faculty members with varied experience in areas of Distributed Computing, Grid Computing, Software Testing, computer Networks, Data Structures, system Software, Computer Architecture, Database Management System, Multimedia, Object Oriented Programming, Computer Graphics and Image Processing and Artificial Intelligence. Faculty Members are carrying Research work in the areas of Software Testing, Service Oriented Architecture, Networks, Data Mining. More than 80 technical papers presented by the faculty in various national and international conferences. More than 80% of the students have been recruited by the leading IT industries through campus placement. Faculty members have acquire certificates from leading industries in areas like LinuX, Open Source Software, Advanced Java Technology and Software Testing. Regular value added courses on advanced topics are run to supplement the curriculum for the students. Be familiar with several language paradigms and how they relate to different application domains; Understand the design space of programming languages, including concepts and constructs from past languages as well as those that may be used in the future; Develop a critical understanding of the programming languages that we use by being able to identify and compare the same concept as it appears in different languages. Intranet facility is available for providing Ebooks, Course Schedules, Lab manuals, assignments, Placement tips, Aptitude test papers and Internal exam results to the students.
LIST OF COMMITTEE MEMBERS

Chief patron Patrons : Kalvivallal Thiru.T.Kalasalingam, Chairman : Thiru K.Sridharan, Secretary Dr.S.Arivalagi, Director (A&A)
ORGANIZING COMMITEE
Organizing Chairman : Dr.T.A .Raghavendiran.Principal Co-Organizer Convenor : Prof.J.R.Balakrishnan,Director(CSE/IT/MCA) : Mrs.S.RoselineMary,HOD/CSE
Organizing Secretary : Mr.K.Karnavel,Lecturer/CSE
ADVISORY COMMITTEE
Prof.Dr.S.RadhaKrishnan Prof.Dr.K.S.Easwara Kumar Prof.Dr.C.Chellappan Prof.Dr.G.V.Uma Prof.Dr.R.Raju Prof.Dr.R.Dillibabu Prof.Dr.D.Manjula Prof.Dr.P.Anandhakumar Prof.Dr.BalaSwaminathan Prof.Sekar Swaminathan Prof.Dr.Bharadwaj Veeravalli
Vice-Chancellor,Kalasalingam University Anna University,Chennai Anna University,Chennai Anna University,Chennai Anna University,Chennai Anna University,Chennai Anna University,Chennai Anna University,Chennai Renissance Technologies.USA Vice President/Megapath CA,USA National University,Singapore
TECHNICAL COMMITTEE Mr. K. Karnavel, CSE Mr. S. Praveen Kumar, CSE Ms. N. Selvameena, CSE Mr. N. Vasudevan, CSE Mrs. P. Hemalatha, CSE Mrs. K. Rejini, CSE Ms. P. Sarala, CSE INVITATION & CERTIFICATE COMMITTEE Mr.T.Karthick, CSE Mrs.K.Rejini, CSE Mr.P.Karthik, CSE Mrs.Mary Joseph, CSE PROCEEDINGS AND SOUVENEIR COMMITTEE Mr.N.Vasudevan, CSE Mr.P.Hemalatha, CSE Mr.P.Sarala,CSE Mrs.P.Suthathira Devi, CSE Ms.R.Femila Goldy, CSE Mr. D. Anand Joseph Daniel, CSE RECEPTION AND REGISTRATION COMMITTEE Mr.M.B.Prashanth Yokesh, CSE Ms.S.Stella Maduram. CSE Mrs. D. Arul Devarajam, CSE Mrs.K.Rejini. CSE Mrs.S.Vanishree, CSE Ms. M. Shree Ranjani, CSE
DECORATION COMMITTEE Ms.A.Malathi, CSE Mr.A.S.Balaji, CSE Mr.B.Jagadeesh, CSE ICONIC OFFICE & MOMENTO COMMITTEE Ms.N.Selvameena, CSE Ms.M.Shree Ranjani, CSE Mrs.S.Angelin Beulah, CSE Ms. Shymala CSE FINANCE COMMITTEE Mrs.D.Arul Devarajam, CSE Mrs.M.Maheswari, CSE Mrs.K.Amsavalli, CSE CATERING, HOSPITALITY & ACCOMMODATION COMMITTEE Mrs.M.Maheswari, CSE Mr. T. Karthick, CSE Mr. P. Karthick, CSE Ms.A.Malathi, CSE (Girls Hostel) Mrs. S. Vanishree, CSE Mr.A.Marimuthu, CSE BANNER AND KIT COMMITTEE Mr.D.Anand Joseph Daniel, CSE Mr.T.Karthick, CSE Mr.P.Karthick, CSE TRANSPORT COMMITTEE Mr.A. S. Balaji Mr.S.Senthil Guru, CSE Mr.M.Ayasamy, CSE
Organizedby:Department of Computer Science and Engineering, Anand Institute of Higher Technology, Chennai. www.iconic12.in E-mail: iconic.aiht@gmail.com
TECHNICAL SUPPORT COMMITTEE Mr.S.Praveen Kumar, CSE Mrs.P.Suthathira Devi, CSE Mr.G.Sangili Raj, CSE Mr.G.Ramesh, CSE Mr.S.Senthil Guru, CSE Mr.N.Gomathi Shankar, CSE Mr.M.Ayasamy, CSE Mr.A.Marimuthu, CSE
LIGHTING, AC, GENERATOR SUPPLIERS, AUDIO VISUALS COMMITTEE Mr.B.Jagadeesh, CSE Mr.G.Ramesh, CSE Mr.N.Gomathi Shankar, CSE
STUDENT VOLUNTEERS
Iyr M.E (CSE) 1. Anton Bose J 2. Audhavan S 3. Benny Jover Gift J 4. Christy Daniel D 5. Deeraj T 6. Elakiya R 7. Hena Petricia D 8. Jenitha Mary L 9. Jeyakumar Samraj S IIyr M.E (CSE) 1. Anuranjani K 2. Aravind Kumar S 3. Delhi Rajagopal S 4. Dhanyajayan K.S 5. Divya K.L 6. Ellakiya K 7. Felsi K 8. Kalaivani K 9. Kanipriya M 10. Krishna Narayanan S
10. Karthika S 11. Mithila Bell S 12. Pugazendhi J 13. Richard Nickson CP 14. Ruth Thabitha Tamizharasi A 15. Santhoshkumar J 16. Subalin S 17. Thamaraiselvan M 18. Umamaheswari C
11. Nathezhtha T 12. Nishitha T 13. Prabakaran C 14. Senbagam B 15. Surya V 16. Tamizharasi T 17. Vaishnavi Y 18. Vidya Lakshmi C Shyam Sundar Raju (IV/CSE-A)
DRDO Sponsored International Conference on Intelligence Computing ICONIC12, Organized by Department of Computer science and Engineering, AIHT, Chennai.
SL.No. Paper ID
Paper Title
Author
Jun Li Bushi Peng Moon Ho Lee Chonbuk National University, Korea T.Jayavel R.Dilli Babu Anna University, Chennai S. Jeevidha M.P. Reena Sri Venkateswara College Of Engineering, Chennai. Radha.R Tina Esther Trueman Anna University Of Technology Chennai. T.T. Mirnalinee SSN College Of Engineering, Chennai. S. Newlin Rajkumar V.Sheela Devi Anna University Of Technology,Coimbatore K.Veerakumar Anna University Of Technology, Madurai Dr.C G.Ravichanran Rvs College Of Engg And Technology, Dindigul Teena Anicattu Mathew Dr. T. Ravi Anna University, Chennai
1.
CSEIC01
An Alternative Codebook Design Of Multi-User MIMO System For 3GPP-LTE Release 12
2.
CSEIC02
Application Of Lean Principles In Health Care Sector
3.
CSEIC04
Temporal Link Signature Measurements For Location Distinction Using MAC Layer Optimization
4.
CSEIC05
An Effective Text Clustering And Retrieval Using SAP
5.
CSEIC06
Link Based Routing In Suspicious MANETs
6.
CSEIC09
Automatic Segmentation And Classification Of Lung CT Images
7.
CSEIC10
SVM Based Intelligent Prediction with Heart Disease Datasets
SL.No. Paper ID
Paper Title
Author
Sathiya Priya C.Sindhu Vellamal Engineering College,Chennai K. Lingadevi M.P. Reena Sri Venkateswara College Of Engineering, Chennai.
8.
CSEIC11
Detecting Maliciouspacket losses
9.
CSEIC12
Performance Analysis Of Mobility Management Schemes Using Random Walk Model In WIMAX Wireless Mesh Networks
10.
CSEIC13
Motion Detection Using Recognition Algorithms
P.Sivaprakash Dr.C G.Ravichanran RVS College of Engineering, Anna University Technology, Dindigul.
11.
CSEIC14
Improving Security And Efficiency In Attribute Based Data Sharing
V. Nirmalrani Sathyabama University, Chennai S. Muthu Lakshmi SRM University, Chennai V. Senthil Kumar PRIST University, Chennai
12.
CSEIC17
An Integrated Method Of Kano Model And QFD For Designing Impressive Qualities Of Hotel Management Service
M.SenthilKumar Dr.R.Baskaran Anna University, Chennai
13.
CSEIC18
A RBF Based Learning Approach To Predict The Status Of SCM Repository
S. Divya Bharathi S. Subha M. Ashok Rajalakshmi Institute Of Technology, Chennai. A. Umadevi Veltech university, Chennai.
SL.No. Paper ID
Paper Title
Author
14.
CSEIC19
Joint Flow Routing And Relay Node Assignment In Cooperative Multi-Hop Networks
M.Kaleeswari S.Selvakumari Francis Xavier Engineering College, Tirunelveli
15.
CSEIC20
Performance Evaluation Of Two Clustering Schemes In The Discovery Of Services In MANET Environment
Dr. Ilavarasan Pondicherry Engineering College, Puducherry S. Prasath Sivasubramanian Avvaiyar Govt. College For Women, Karaikal.
16.
CSEIC22
Survey of TCP adoptions In Mobile Ad-Hoc networks
Christina .J Kanchana .A Revathy.V Imthyaz Sheriff Easwari Engineering College
17.
CSEIC23
A Video Aware FEC based Unequal Loss Protection System for Video Streaming over RTP
R. Sandeep K.C. Vignesh N. Viswanathan S.P. Balakannan Anand Institute of Higher Technology, Chennai
18.
CSEIC24
Computer network database attacks security from threats and hackers
R. Shankar A. Sankaran Indira Institute of Engineering and Technology, Pandur.
SL.No. Paper ID
Paper Title
Author
D.Manimegalai S. Cloudin KCG College of Technology, Chennai
19.
CSEIC25
Seamless Network Connectivity In Vehicular Ad Hoc Network
20.
CSEIC26
A Survey Of Qos Issues And On Demand Routing Protocols Performance Analysis In Mobile AD Hoc Networks
P. Sivakumar Manakula Vinayagar Institute Of Technology, Puducherry. Dr. K. Duraiswamy K.S.R College Of Technology, Tiruchengode S.Suresh R.Valarmathi S.Naveen Kumar S K R Engineering College Ms.S.P.Parameswari Rvs College Of Engineering And Technology,Dindigul Mr.R.Thukkaisamy Sasurie College Of Engineering,Erode N.Ashok Kumar R.V.S College Of Engineering&Tech, Dindigul Dr. A.Kavitha PSNA College Of Engineering&Tech, Dindigul
21.
CSEIC27
Building An Offline Barcode Scanner For Android Device
22.
CSEIC28
Dynamic Voltage Scaling For Multi-Processor Using Gals
23.
CSEIC29
Fault Injection Based Virtual Channel Allocation Of Network-On-Chip(NOC) Router
24.
CSEIC31
Ber Performance Analysis Under Rayleigh Model
A.Sahaya Bindhya Dhas T.Jeril Viji Saveetha Engineering College,Chennai
SL.No. Paper ID
Paper Title
Author
S. Mahalakshmi Easwari Engineering College, Chennai.
25.
CSEIC33
Effective Clustering Of Web Opinions For Social Networking Site Using Scalable Algorithm
26.
CSEIC35
Hierarchical Modeling And Analysis Of Cloud Services With Energy Efficiency
R.H. Bavya Sugani J.Preethi Janet Rajas Engineering College, Thirunelveli. D.Arulsuju
GKM College of Engineering & Technology,Chennai
27.
CSEIC36
Adaptive Filter Algorithm To Cancel Residual Echo For Speech Enhancement
P. Gayathri Jaya Engineering College, Chennai
28.
CSEIC37
Fuzzy Keyword Search Over Encrypted Data In Cloud Computing
M. Balasubramani V. Singaravel Bharathidasan Engineering College, Nattrampalli,Vellore
29.
CSEIC39
Memory Management In Map Decoding Algorithm Using Trace Forward Radix 2 X 2 Technique In Turbo Decoder By Using Of VLSI Technology
R. Mohanraj Rvs College Of Engg And Tech, Dindigul
30.
CSEIC40
Wearable Computing In Smart Hospitals
S.Vanitha N.Jayalakshmi R.Varalakshmi Sri Manakula Vinayagar Engineering College, Pondicherry.
SL.No. Paper ID
Paper Title
Author
31.
CSEIC41
Detection Of Sinking Behavior Using Signature Based Detection Technique SVM & FDA In Wireless ADHOC Networks
A.Vinodh Kumar P.Naveen Kumar S.P.Varun Baalaji Anand Instiute Of Higher Technolgy, Chennai
32.
CSEIC42
Mobile Ad Sense
R. Bhavani T. Jeril Viji Saveetha Engineering College, Chennai.
33.
CSEIC44
An Intelligent Instant Messenger Using Smart Device Based On Jabber Server A High Tech Innovative Communication Tool
G. Nalini Priya KCG College Of Technology, Chennai. Maheswari K.G IRTT, Erode
34.
CSEIC45
Similarity Model With User- And Query-Dependent Ranking Approach For Data Retrieval
R. Raju Thangalatha Legaz M. Pakkirisamy S. Md . Haja Sherif P. Venkadesan Sri Manakula Vinayagar Engineering College, Pondicherry.
35.
CSEIC47
Evolution And Emerging Trends On Internet Of Things
V. Subashini A. Umamaheswari P. Subhapriya Prof. K. Premkumar Sri Manakula Vinayagar Engineering College, Puducherry
SL.No. Paper ID
Paper Title
Author
Dr.S.P.Balakannan R.Aishwarya N.Induja Anand Institute Of Higher Technology, Chennai
36.
CSEIC48
Distribution Based Geographic Multicasting Over MANET
37.
CSEIC49
Efficient Keyword Based Search In Relational Database
R.Suresh K. Saranya S. Dhivya K.Thilagapriya Sri Manakula Vinayagar Engineering College,Puducherry
38.
CSEIC50
The Use Of RFID For Human Identification (Wireless Communication)
M. Hakkeem R. Ganesh United Institute Of Technology, Coimbatore T.Ayesha Rumana Rvs College Of Engg &Technology,Dindigul
39.
CSEIC51
Image Based Steganography Using New Symmetric Key Algorithm And File Hybridization (NSAFH)
P.Savaridassan P.Angaiyarkanni, R.Kavitha, R.Vallarmadi R.Ilakkiya Sri Manakula Vinayagar Engineering College, Pondicherry. Sruthi Franco C Pradeesh Kumar KCG College of Technology, Chennai
40.
CSEIC52
Blacklist Based Anonymous User Blocking
SL.No. Paper ID
Paper Title
Author
Mr.S.Nagakumararaj RVS College Of Engg And Tech,Dindigul
41.
CSEIC54
High Speed SoC By Means Of Enhancing Plc Using VLSI Technology
42.
CSEIC55
A Survey On Types Of Mobile Authentications Used In Mobile Banking Sectors
S.Kalaichezhian K. Purushothaman, G. Murugavel V. Krishna Kumar A. Meiappane Sri Manakula Vinayagar Engineering College, Pondicherry. J.SandhanaNirmala Kaviya S.Praveenkumar Saveetha Engineering College,Chennai
43.
CSEIC56
Designing Micropump Mechanically For Transdermal Drug Delivery
44.
CSEIC57
Home Security Using Artificial Intelligence In Air Conditioners
G.P.Abinaya Anand Institute Of Higher Tecnology, Chennai
45.
CSEIC58
Evolution Of Enterprise Cloud Computing
D. Kanagalatchumy S. Nandhini D. Punitha Prof. A. Ramalingam Sri Manakula Vinayagar Engineering College, Pondicherry.
46.
CSEIC59
Selective Harmonics Elimination (SHE) In Single Phase Pulse Width Modulation (PWM) Rectifier
Mrs.S.Pandeeswari K. Sivaraman R.V.S. College Of Engg. & Tech., Dindigul
SL.No. Paper ID
Paper Title
Author
R.Raju Kayalvizhi Devakumaran Aishwarya Balasubramanian Gayathri Balan Sri Manakula Vinayagar Engineering College, Pondicherry. Jasmine.J. Nainita S. Praveen Kumar Saveetha Engineering College,Chennai.
47.
CSEIC60
A Hybrid Model Using Adaboost And GMM-EM For Solving Outliers
48.
CSEIC61
Glucose Level Detection In Exhaled Breath Condensate Using MEMS Sensor
49.
CSEIC63
Tracing A Mobile Object In Wireless Networks (MANET)
G. Partheeban M. Vignesh S.P. Balakannan Anand Institute Of Higher Technology, Chennai K. Sendil Kumar K.Sathishkumar P.Ravisasthiri R.Sridharan Sri Manakula Vinayagar Engineering College, Pondicherry.
50.
CSEIC64
Leaf Cutter Ant Model For Web Service Selection And Execution
51.
CSEIC65
Deployment Of P-Cycle For All Optical Mesh Networks Under Static And Dynamic Traffic
V. Pradeepa RVS College Of Engineering And Technology,Dindigul
52.
CSEIC67
Escalating energy for wireless sensors
U. Prithivi Rajan S. Praveen Kumar Saveetha Engineering College, Chennai.
SL.No. Paper ID
Paper Title
Author
Mr. R..Raju G. Sumathi J.Vidhya@Shankardevi Sri Manakula Vinayagar Engineering College, Puducherry
53.
CSEIC68
Software Aging Analysis Through Machine Learning
54.
CSEIC69
Optimal Packet Processing Systems In Network Processor IXP2400 With Dynamic Multithread System
S.Sahunthala Anand Institute Of Higher Technology
55.
CSEIC70
Data Hiding, Partial Request And Data Grouping For Access Control Matrix In Cloud Computing
J.Ilanchezhian K.Arun A.Ranjeeth V.Varadharassu Sri Manakula Vinayagar Engineering College, Puducherry
56.
CSEIC71
Analytical Investigation Of Thermo Actuator And Simulation Of MEMS Resonator
K. Malarvizhi T. Sripriya J. Jayalakshmi Saveetha Engineering College, Chennai A.Gnanasundari, A.Anitha B.Chithra Prof.Gowri Sri Manakula Vinayagar Engineering College, Puducherry, K. Padmapriya A. Anand Dr.P.Senthil Kumar Saveetha Engineering College,Chennai
57.
CSEIC72
Overview Of Honeypots And Botnet Detection Methods
58.
CSEIC73
LDPC Decoding Algorithm For Predicting And Screening Genome Sequences
SL.No. Paper ID
Paper Title
Author
59.
CSEIC74
Resume Web Service Transactions Use Rule-Based Coordination
T. Tamilselvan K. Pugalanthi A. Vimalraj D. Vettri Sri Manakula Vinayagar Engineering College, Pondicherry.
60.
CSEIC75
MoRE-learning
T.Karthick A.Mercy Vinoleeya Shefani K.Anjuman Sakeena Mubeen Anand Institute Of Higher Technology, Chennai
61.
CSEIC76
Cross Layer Optimization For Multi user Video Streaming Using Distributed Cross Layer Algorithm
Mr. R. Raju J. Vidhya@Shankardevi G. Sumathi Sri Manakula Vinayagar Engineering College, Pondicherry.
62.
CSEIC77
Throughput Enhancement and Delay Reduction in P2P Network
Deepthi.K.Oommen R.Siva KCG College of Technology,Chennai
63.
CSEIC79
A Cloud Computing Approach for Parallel Data Processing and Dynamic Resource Allocation
P.SuthanthiraDevi Anand Institute Of Higher Technology, Chennai
SL.No. Paper ID
Paper Title
Author
64.
CSEIC80
To A Secured And Reliable Storage Services In Cloud Computing
P.Savaridassan G.Gnanasambantham, I.Nesamani, V.Vinothkanna Sri Manakula Vinayagar Engineering College, Pondicherry.
65.
CSEIC81
Identification And Detection Of Pharam Attack Using Bayesian Approach With SMS Alert
Mr.K.Karnavel A.Akilan , M.Gobi , J.Aravind Raj Anand Institute Of Higher Technology, Chennai
66.
CSEIC82
A Routing Algorithm based on Fuzzy logic in MANET
D.Udaya E.Arulmozhi Dr. Pauls Engineering College, Pulichapallam
67.
CSEIC83
Transmission Power Controlled MAC Protocol for Ad Hoc Networks
C.Dhivyalakshmi Dr. Pauls Engineering College, Pulichapallam N. Arul Kumar M. Govindaraj Bharathidasan University, Trichy V.Varadharassu J.Ilanchezhian A.Ranjeeth K.Vignesh Sri Manakula Vinayagar Engineering College, Pondicherry
68.
CSEIC84
A novel model for efficient data management in Electronic Toll Plaza with a centralized fixed machine for VANETs
69.
CSEIC85
A Study On QoS-Aware Service Selection In SOA
SL.No. Paper ID
Paper Title
Author
R.Gobi Dr.E.Kirubakaran Dr.E.George Dharma Prakash Raj Bharathidasan University, Trichy
70.
CSEIC86
Framework for Data Management in Mobile Location Based Services
71.
CSEIC87
Incorporating Pre Audit Method in Multilevel Secure Composite Web Services
S.Rajesh D.Jagadiswary Dr.Pauls Engineering College, Pulichapallam.
72.
CSEIC89
Web Intrusion And Anomaly Detection Based On Data Clustering And Adam
R.Sylviya K.Indumathi M.Vanitha A.Meiappane Sri Manakula Vinayagar Engineering College, Pondicherry
73.
CSEIC90
Enabling Agile Testing From End To End Continuous Integration To Completely Eliminate The Blind Spot
K.Karnavel J.Santhosh Kumar S.Audhavan M.Thamarai Selvan Anand Institute of Higher Technology, Chennai
74.
CSEIC91
Publishing Search Logs A Comparative Study of Privacy Guarantees
S.Krishna.narayanan M.E CSE, Anna University of technology Chennai N.Selvameena Anand Institute of Higher Technology.Chennai
SL.No. Paper ID
Paper Title
Author
75.
CSEIC92
A Tool For Measuring The Service Granularity In Web Services
J. Geetha Kishorekumar T. Chitralakshmi, K. Chindhyaa S. Revathi Sri Manakula Vinayagar Engineering College,Puducherry
76.
CSEIC93
An Efficient K-Means Algorithm For Clustering Categorical Data
R.Tamilselvan K.Dinesh Kumar Dr.C.Palanisamy Vivekanandha Institute of Engineering and Technology,Namakkal
77.
CSEIC94
VANET Technologies, Security Threats and Research Area: A Study
R.Vivadha, D.Anbarasy M.Sivapriya R.Vinodha Sri Manakula Vinayagar Engineering College, Madagadipet, Puducherry .
78.
CSEIC95
Effective Modification Of Exact Clones Using Clone Manager
E. Kodhai M. Subathra, I. Bakia Ranjana, M. Banupriya Sri Manakula Vinayagar Engineering College, Madagadipet, Puducherry
SL.No. Paper ID
Paper Title
Author
A.Naveenraj R.Utham D.Sathish Bharathi Vengateshwaran K Sri Manakula Vinayagar Engineering College, Madagadipet, Puducherry
79.
CSEIC96
A Neuro-Fuzzy Approach To Predict Online Auction Final Bids
80.
CSEIC97
License Plate Identification For Intelligent Transportation
R.Dinesh A.P.Shanthi Anna University, Chennai
81.
CSEIC98
Intensified reputation based security algorithm using fuzzy-logic (IRSF)
Dr. D. Sivakumar Arunai College of Engineering . Thiruvannamalai R. Sudha Dr.Pauls Engineering College Villupuram
82.
CSEIC100
Group Key Transfer Protocol for Group Communication
R.Velumadhava Rao Anna University, Chennai
83.
CSEIC101
Hybrid Framework For Feature Reduction Of MAC Layer In Wireless Network
S.Krishna Kumar S.Anbuchelian Anna University, Chennai
84.
CSEIC102
Multi-Agent Ontology Based Reinforcement Learning To Support Service Marketing
S.Mannarmannan S.Venkatesh Anna University of technology,Coimbatore
SL.No. Paper ID
Paper Title
Author
85.
CSEIC104
Black Market Botnet and Securing Information
S.Jeyakumar samraj C.P.Richard Nickson J.Anton Bose J.Pugazendhi Anand Institute of Higher Technology,Chennai
86.
CSEIC105
Secure Quantum Key Distribution Using Central Authority
T.Deeraj J.Benny Jover Gift Anand Institute of Higher Technology,Chennai
87.
CSEIC106
Logic Macro programming for Wireless Sensor Networks
N.K Senthil Kumar S.Kavitha Vel Tech Dr. RR & Dr.SR Technical University
88.
CSEIC107
A Comprehensive Model to achieve Service Reusability for Multi level stakeholders using Non- Functional attributes of Service Oriented Architecture
Shanmugasundaram .G V. Prasanna Venkatesan C.Punitha Devi Pondicherry University
89.
CSEIC108
Classification of network activity schema for scan detection
M.Abdul Rahim R.Sri Balaji B.R.Jathin S.Palpandi Karpaga Vinayaga College of Engineering and Technology,Chennai
SL.No. Paper ID
Paper Title
Author
90.
CSEIC109
Security Enhancement of Secure Simple Pairing & Group Key Transfer Protocol in Bluetooth
K.Gandhimathi, S. Jayaprakash, Idhaya Engineering College For women,chinnasalem
91.
CSEIC110
A Cloud Computing Platform Support Resting On Peer To Peer Network
P.Bhavani A.Malathi S.Sasikala SRM University,Chennai
92.
CSEIC111
A Survey Of Recent Research And Detailed Design Methodologies Intended For Industry Strength Of Software Engineering
K.Karnavel Dr.R.DilliBabu Anna University,CEG, Chennai DR.R.S..Rajesh V.Akalya Manonmaniam Sundaranar University, Tirunelveli S.Praveen Kumar J.Mannar Mannan Anna University of Technology, Coimbatore J.Mannar Mannan M.Karthik Anna University of Technology, Coimbatore L.Abhijith Galla Mahesh N.K.Reghunath S.Palpandi
93.
CSEIC112
A Novel Authentication Scheme Using Finger Print For Data Protection
94.
CSEIC113
An Optimized Workflow Composition Through Ontology Based Planning
95.
CSEIC114
Preparing Datasets From Databases Using Horizontal Aggregations With Holistic Functions
96.
CSEIC103
Accessing Restricted Webservices In Mobile Phones Using Biometrics
Karpaga Vinayaka College of Engg & Tech.
An Alternative Codebook Design of Multi-User MIMO System for 3GPP-LTE Release 12

1,2,3
Jun Li1 , Bushi Peng2 , Moon Ho Lee3 Div. of Electronic & Information Engineering, Chonbuk National University, Jeonju, Korea.
ABSTRACT
In this paper, we propose a novel codebook design for linear precoding coding scheme. In the proposed scheme, it specifies a limited feedback method that uses Orthogonal Space Time Codes (OSTBC) as the codebook of precoding matrices, which is known to both the transmitter and receiver. The receiver chooses a matrix from the codebook and transmits the optimal codebook matrix to the transmitter. By using the OSTBC codebook, we can get very close result to the conventional precoding scheme, which used the DFT codebook [1]. Simulation result shows that the proposed scheme is almost same to the conventional precoding scheme. Keywords-Orthogonal Space Time Codes; precoding; codebook
I.
INTRODUCTION
The benefits of using multiple antennas at both the transmitter and the receiver in a wireless system are well established. Multiple-input multiple-output (MIMO) systems enable a growth in transmission rate linear in the minimum number of antennas. It is well known that the performance of a MIMO space time code system can be improved with channel knowledge at the transmitter [2]. The channel knowledge at the transmitter does not help to improve the degrees of freedom but power or beam-forming gain is possible. In a non-i.i.d. channel (such as correlated Rician fading), the channel knowledge at the transmitter offers even greater benefit in performance. Therefore, exploiting transmit channel side information is of great practical interest in MIMO wireless [3]-[4]. In a time division duplexing (TDD) system, the channel knowledge can be obtained at the base station such as eNB by uplink transmissions thanks to channel reciprocity. However, the sounding signals needs to be transmitted on the uplink, which represents an additional overhead. In a frequency division duplexing (FDD) system, the channel state information (CSI) needs to be fed back from the user equipment (UE) to the eNB. The complete channel state feedback can lead to excessive feedback overhead. An approach to reduce the channel state information feedback overhead is to use a code book. Precoding is a
processing technique that exploits channel state information at the transmitter (CSIT) by operating on the signal before transmission. For many common forms of partial CSIT, a linear precoder is optimal from an information theoretic view point [5]. A linear precoder essentially functions as a multimode beamformer, optimally matching the input signal on one side to the channel on the other side. It does so by splitting the transmit signal into orthogonal spatial eyebeams and assigns higher power along the beams where the channel is strong but lower or no power along the weak. Preceding design varies depending on the types of CSIT and the performance criterion. In this paper, we use 4 by 4 OSTBCs as a codebook in linear precoding system. Before transmission, a four-antenna OSTBC matrix is multiplied by a linear precoding matrix and transmitted over 2 antennas. The optimal linear precoding matrix is chosen from a finite cardinality codebook of possible precoding matrices that is designed off-line and available to both the transmitter and receiver. This paper is organized as follows. In section II, the system model is described. In section III, codebooks are introduced and design algorithm of the proposed codebook is described in detail. In section IV, simulation results are presented. Finally the conclusions are given in Section V.
Organized by: Department of Computer Science and Engineering , Anand Institute of Higher Technology, Chennai. www.iconic12.in E-mail: iconic.aiht@gmail.com Page 1
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) size.If the maximum-likelihood (ML) decoding scheme is used to extract the transmit symbols in Y, the conditional symbol error probability decreases exponentially as the channel Frobenius norm goes up, i.e.,
Pr error | H exp HWi
Fig.1 Linear precoder and space-time code in correlated fading channel
2 F
(3)
SYSTEM MODEL
In a closed-loop MIMO precoding system, for each transmission antenna configuration, we can construct a set of precoding matrices and let this set be known at both the eNB and the UE [6]. Fig.1 shows the block diagram of the MIMO system with space-time code and precoder. Let us consider a M r M t MIMO system in a block fading channel, where the number of the transmit antennas is less than that of the receive antennas. The channel gain between different transceiver pair is assumed to be an i.i.d., complexity Gaussian random variable whose probability density function (PDF) is CN (0, 1). Let X M T denote a space time codeword with a length of M, which is represented as
X [x 1 x 2 xT ]
T
where is a constant independent of H. In order to minimize the symbol error rate (SER), Wi is selected to maximize the minimum distance between two OSTBC codeword. Mathematically, it can be:
Wi* arg max min HWi X l X m
i1,, N l m
= arg max HWi

i1,, N
(4)
(1)
where the second equality comes from the fact that both and are orthogonal. Hence, it is clear to see that the precoder is chosen such that the Frobenius norm of the equivalent channel response is maximized. The corresponding minimization problem can be formulated into the Grassmannian subspace packing , which is selected form F, should problem [8]. TheW be designed by maximum the minimal chordal distance min min k l ,1k ,l L d (W k ,W l ) between any pair of codeword. The chordal distance is defined as:
d (W k ,W l ) 1 W kHW l
2
Where x k [x k ,1x k ,2 x k ,M ] , k 1, 2, ,T and M M t . As a result, the received signal corresponding to one OSTBC symbol Y can be formulated as
Y
(5)
where 1 k , l L .
HWX N
(2)
CODEBOOK DESIGN IN PRECODING SYSTEM

In this section, we use S to denote the proposed codebook, which is consist of L complex matrices of a dimension M t M with M=1 for simplicity. Next, it is going to introduce the optimal unitary codeword,W opt , if the capacity of the feedback channel is not limited [7]. Please note that evenW opt , which demands much more channel capacity for the precoder information feedback, is not practical enough in a feedback limited system, it does provide insight to our codebook design. The optimal codeword W opt is composed of M orthonormal basis vectors in W i and is denoted
where denotes the signal to noise power ratio, M M represents the channel transform H
r t
information. W M M is a precoding matrix which is chosen from the codebook F [ W 1,W 2 , ,W L ] .

t
is a noise sample matrix and its (i,t) N element, ni ,t , which denotes the noise sample at the
M r T
as an i.i.d. complexity Gaussian random variable, i.e., ni ,t ~ CN (0,1). Rather than the full CSI, only the corresponding index is fed back to the transmitter side. Each index can be represented with N bits, which allows for a total number of L = 2 N codewords in the codebook. Note that L is referred to as a codebook
i th receive antenna at the time instance t, is modeled
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) In this section, we present the switching codebook scheme. First, E 2 different codebooks of size N 1 are created with N 1 feedback bits: 2
N 2 1
1
asW opt
M t M
, where denotes all possible M
orthonormal basis vectors inW i . Please note that

W opt is not unique since
HW opt HW optU
(6)
s , 0 e E 1 .
e
(10)
as long as U is a unitary matrix. Suppose that it has a N1-bit codebookG [g 0 , g1, g2 ] , our proposed N-bit
N 1 1
In general, the codebook can simply switch in a sequential order at each feedback time, i.e.,:
, s 0 , s1 , s 2 , , s E 1 , s 0 , s1 , s 2 , .
(11)
codebook
is S [s( 0) , s(1) , , s(2
N2
1)
],
where N 2 N N 1 . Every group can be expressed

N (n ) (n ) , s1 , , s (n ) ] n 0,1, , 2 as s(n ) [s 0 2
N 1 1
, where s
(n ) i
is the i th codeword in the nth group [9]. Comparing with [9], si(n ) could be depicted as:
n gi si(n ) Ofloor (n / M ) P(n mod M ) H g
t t
Compare with the conventional single codebook, the switching codebook provides an E times larger codebook with N 1 feedback bits instead of N bits for the same size of codebook. However, only one codebook can be used at a feedback time t. The channel matrix process is performed by:
s i(t ) arg max Hs in n n
si s F
(12)
(7)
n are used for where O 0 w1w2 w1w2 , Pl and H g
The default codeword is also used in this scheme [11]. If

n Hsin ( t 1) Hs i (t ) ,
rotation of codebook. The ith codebook of an N-bit DFT codebook F is given as follows:
f i
(13)

i 2N
1 Mt
1, e j 2i , , e j 2 M t 1i ,
B
(8)
where i
i 1, , 2 and
2i is the ith sample
the default codeword index, i , is sent back to the transmitter. In [11], the author presented a corner case when the transmitters version of previously selected codeword which is different from that of the receiver. Even in this case, the strict condition in (13) would also guarantee that the default index is selected the maximum of E times codebooks. Then we can rewrite equation (12) as:
s i(t ) arg max Hs in n
si S F
in the angle domain 0, 2 . The codeword w1 is made up of R DFT codewords f F can be written as [10]
w1 i, , R f i 1 , , f i R ,
(14)
(9)
Hence, the switching codebooks choose the same optimal codeword as the single codebook. Property 1: The chordal distance between any two codewords is same as the GLP codebook, which is specified as:
) () d s( = d (g p , g q ) p , sq
n n
where r 1 r R are the relative deviations. Both the values of i and r are restricted within the angle sample set, which is known at both transmitter and receiver. w2 is selected from GLP codebook which is composed of M t codebooks.
(15)
N 1 1
n 0,1, , 2
N 2 1
, p 0,1, , 2
N 1 1
, q 0,1, , 2
Proof:
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) Since O in (7) is an unitary matrix. It is obvious that permutation matrix P and the householder matrix n are also unitary matrices. Hence, we have H g
where W i , i 1, 2, , 2K is precoding matrix with k bits precoding feedback information. The full precoding matrix G 0 is
x1 * x S 0 *2 x 3 0 x2 x
* 1
O
(
floor ( n / M t )
n P(n mod M ) H g
t
O
H
floor ( n / M t )
n P(n mod M ) H g
t
I
(16)
x3 0
* x 1 * x 2
So, we can get

d s p , sq = = = 111(n) (n)
0
* x3
)
n p H
0 x3 x2 x 1
(17)
(s( ))
(O
() sq
n
where x 1, x 2 , x 3 0,1 . Hence, the total codebook size

H 2
floor ( n / M t ) H
n )g q ) n )g p ) (O floor ( n / M t ) P( n mod M t ) H (g P( n mod M t ) H (g
L is defined as 64. W i is he 4 by 2 precoding matrix which takes the first two columns of S i . According to DFT codebook, our proposed codebook has a lower complexity.
(g p )
gq
= d (g p , g q )
According LTE release 10 standard [12], in order to maximize the HW i in (4), it chooses the first two columns from the best codebook which is 6 bits feedback. The 4 by 4 DFT matrix is given as:
1 1 1 j 2 e j 1 1 e W0 j e j 2 4 1 e 1 e j 3 2 e j 3 1 e e j 3 j 9 2 e
j 3 2
CONCLUSION
In this paper, we have presented a novel codebook design in the linear precoding with limited feedback.
10
0
BER of OSTBC with/without precoding in Rayleigh fading channel using 4QAM Alamouti 2 1 without precoding OSTBC Conventional
10
-1
Bit Error Probability P B
(14)
10
-2
10
-3
The remaining precoding matrices W i are obtained:

j 2 .1/ 4 j 2 .8/ 4 j 2 .61/ 4 j 2 .45/ 4 Wi diag e e e e
10
-4
i 1
W0
(15)
10
-5
where i 1, 2, , 63 . The simulation will be shown later.
10
-6
10 12 Eb / N0 [dB]
14
16
18
20
In this proposed scheme, it is to use Alamouti code (2TX) to fit a 4 transmitter antenna system by using OSTBC alterative codebooks in the linear precoding with 6 bits feedback. The precoding structure is specified as:
channel 1 4
% h
Wi
precoding 4 2
22 Alamout i code 22
(16)
It is very hard to beat the conventional precoding scheme with DFT codebook, but we can approach a close result to it with a lower complexity. We analyze the BER performance with the 6 bits feedback using the OSTBC codebook and DFT codebook in Rayleigh fading channel using 4QAM. By using this method, it is capable of achieving the similar performance in Fig.2.
ACKNOWLEDGMENT
This work was supported by World Class University R32-2010-000-20014-0, Fundamental
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) Research 20100020942, and the second stage of Brain Korea 21 Project in 2011, National Research Foundation (NRF), and Republic of Korea.
[1]
[7]
D. J. Love and R. W. Heath. "Limited feedback unitary precoding for orthogonal space-time
[2]
[3]
[4]
[5]
[6]
D. J. Love and R. W. Heath, Limited Feedback Unitary Precoding for Spatial Multiplexing Systems, IEEE Transactions on Information Theory, vol. 51, pp. 29672976, August 2005. V. Tarokh, N. Seshadri, and A. R. Calderbank, Space-time codes for high data rate wireless communication: Performance criterion and code construction, IEEE Trans. Inf. Theory, vol. 44, pp. 744765, Mar. 1998. M. Alamouti, A simple transmit diversity technique for wireless communications, IEEE J. Sel. Areas Commun., vol. 16, pp. 14511458, Oct. 1998. Y. Shang, X.-G. Xia, "Space-time block codes achieving full diversity with linear receivers", IEEE Trans. Inf. Theory, vol. 54, pp. 4528-4547, Oct. 2008. G. Caire and S.S. Shamai, On the capacity of some channels with channel state information, IEEE Trans. Inform. Theory, vol. 45, no. 6, pp. 20072019, Sept. 1999. M. Skoglund and G. Jngren, On the capacity of a multiple-antenna communication link with channel side information, IEEE J. Select. Areas Commun., vol. 21, no. 3, pp. 395405, Apr. 2003.
REFERENCES
[8] [9]
[10]
[11] [12]
block codes." IEEE Trans. on Signal Processing, 53(1):6473, January 2005. [3] D. J. Love, R. W. Heath, Jr., Grassmannian Beamforming on Correlated. MIMO Channels, Proc. Globecom, Dallas, 2004. Robust Codebook Design Based on Unitary Rotation of Grassmannian Codebook Weighted DFT Codebook for Multiuser MIMO in Spatially Correlated Channels Multi-User MIMO with Limited Feedback Using Alternating Codebooks 3GPP LTE-Advanced Release 10, http://www.3gpp.org/lte-advanced.
T.Jayavel 1 , R.Dilli Babu 2 PG Student, Department of industrial engineering, Anna University, Chennai 2 Associative Professor, Department of industrial engineering, Anna University, Chennai
1
Application of Lean Principles in Healthcare Sector
ABSTRACT
The complete elimination waste is the target of any qualified system. This concept is vitally important today since in todays highly competitive world there is nothing we can waste. This paper attempts to apply the principles of lean in the service sector with the purpose of eliminating wastes and increasing capacity. Value Stream mapping tool was used to expose the waste and identify a proposed plan for improvement. The results achieved in the proposed plan showed significant improvements in the overall performance of the system, which allowed to be more productive, flexible, smooth and with high quality service Keywords: Lean Value Stream Mapping take time. In lean the wastes are defined as anything which does not add value to the end product. If customer sees the value with the end product, it is very much fair to define a waste in this way. Customers do not mind how much it costs you to repair damage, cost for your huge stocks and stores or other over heads. There are wastes that can be avoided, but some are unavoidable to many reasons. When identifying the wastes and categorize them in to avoidable and unavoidable, you have to think about removing the wastes from the system. However, lean manufacturing always talks about removing, not minimizing. The wastes are everywhere in many different forms. Every organization wastes majority of their resources. Therefore it is worthier to have a closer look at these wastes. These wastes are categorized in to eight categories. These waste categories are over production, waiting, including time in queue, work In Progress (WIP), transportation between workstations or between supplier and customers, inappropriate processing, excess motion or ergonomic problems, defected products, and underutilization of employees. Lean is based on continuous f i n d i n g and removal o f the wastes. Value is defined from the customers point of view. Therefore all the tools in lean aim to identify and remove wastes from the s y s t e m c o n t i n u o u s l y . There a r e f o u r s t e p s i n i m p l e m e n t i n g lean principles. They are; 1. Identifying the fact that there are wastes to be removed, 2. Analyzing the wastes and finding the root causes for these wastes, 3. Finding the
INTRODUCTION
solution for these root causes, and 4. Application of these solutions and achieving the objective. When this is done recalling stage 1 and continue this loop over and over again. To become lean it is very necessary to understand the fact that wastes are there. It must also be able to find out where these wastes do exist. Then it will be able to find out the root causes for these problems and then come up with a way to solve it. To find out where in the process these wastes exist there is a very powerful and simple tool. This well known tool is process mapping. Process map simply maps all the processes and the activities which are carried out in bringing a specific product or a service in to a reality. Irrelevant of the value they add to the final product or the service, the process map includes all the activities from the point of development or order inquiry to making and shipping the goods and up to the point where customer collects the goods. Womack and Jones (2003) advocate the application of Lean thinking in the medical systems. They argue that the first step in implementing Lean thinking in medical care is to put the patient in the foreground and include time and comfort as key performance measures of the system. Having multiskilled teams taking care of the patient and an active involvement of the patient in the process is emphasized [2]. Karlsson et al. (1995) argue that Lean product development, supply chain management, and Lean manufacturing are important areas also in healthcare. The focus on zero defects, continuous improvements and JIT in healthcare makes Lean
Review of related works
Lean Tools
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) concepts especially applicable. The establishment of customer interaction is equally important in the manufacturing industry as it is in the healthcare sector [3]. Young et al. (2004), see an obvious application of Lean thinking in healthcare in eliminating delay, repeated encounters, errors and inappropriate p ro cedu r es [4]. Similarly, Breyfogle and Salveker (2004), advocate Lean thinking in healthcare and give an example of how Lean management principles can be applied to health care processes through the use of the Six Sigma methodology, which in many ways resemble the Lean production techniques [5]. Several case stories on Lean thinking initiatives in healthcare sector can be found in Miller (2005), and Spear (2005) [6,7]. In a recent publication by the Institute of Healthcare Improvement, two health care organizations in the US showed positive impact on productivity, cost, quality, and timely delivery of services after having applied Lean principles throughout the Measuring Lean Initiatives organization. Dickson et al (2009), described the effects of Lean, a process improvement strategy pioneered by Toyota, on quality of care in 4 emergency departments (EDs). Participants in 2 academic and 2 community EDs that instituted Lean as their single process improvement strategy made observations of their behavioral changes over time. They also measured the following me tr i cs related to patient flow, service, and growth from before and after implementation: time from ED arrival to ED departure (length of stay), patient satisfaction, percentage of patients who left without being seen by a physician, the time from ordering to reading radiographs, and changes in patient volume. The results showed that length of stay was reduced in 3 of the EDs despite an increase in patient volume in all 4. Each observed an increase of patient satisfaction lagging behind by at least a year. The narratives indicate that the closer Lean implementation was to the original Toyota principles, the better the initial outcomes. The immediate results were also greater in the EDs in which the frontline workers were actively participating in the Lean-driven process changes. In conclusion, Lean principles adapted to the local culture of care delivery can lead to behavioral changes and sustainable improvements in quality of care metrics in the ED. These improvements are not universal and are affected by leadership and frontline workforce engagement. Wong, Wong, and Ali (2009), investigated the adoption of lean manufacturing in the electrical and electronics industry in Malaysia. A questionnaire survey was used to explore 14 key areas of lean manufacturing namely, scheduling, inventory, material handling, equipment, work processes, quality, employees layout, suppliers, customers, safety and ergonomics, product design, management and culture, and tools and techniques. The respondents were asked to rate the extent of implementation for each of these areas. The average mean score for each area was calculated and some statistical analyses were then performed. In addition, the survey also examined various issues associated with lean manufacturing such as its understanding among the respondent companies, its benefits and obstacles, the tools and techniques used etc. The survey results show that many companies in the electrical and electronics industry are committed to implement lean manufacturing. Generally, most of them are moderatetoextensive implementers. All the 14 key areas i n v e s t i g a t e d serve a s a u s e f u l g u i d e f o r organizations when they are adopting lean manufacturing. This was the first study that investigates the actual implementation of lean manufacturing in the Malaysian electrical and electronics industry.
The Emergency Departments (EDs) play a vital role in providing care to patients and they are recognized for their contribution that they make to the society. The available statistics make it clear about the indispensability of this healthcare service operations that the country relies upon to provide medical services to the patients on a 24/7 basis. The ED healthcare service delivery system represents one of the most visible service sectors where the effects are very stark. Poor service delivery can often make the difference between life and death. The serious issue concerning healthcare in hospitals' ED is that they are very crowded and the waiting times are so long that it is rarely a smooth and satisfying experience. Causes for ED overcrowding are well known and include hospital bed shortage, high
EXPERIMENTAL WORKS AND RESULTS
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) medical acuity of patients, increasing patient volume, shortage of examination space and shortage of medical staff. Even though these issues are well recognized, alleviation of these problems in ED is not trivial and requires addressing the complex systems issues. Our work concentrates o n exploration of improving effectiveness o f ED operations. By creating a framework for eliminating waste in healthcare front-end processes, significant long term benefits can be gained (e.g., shorter lead times, which meaning more patients served), less personnel required, more satisfied patients etc. The KMC hospital is a teaching hospital with a 400-bed facility. It serves for most of the People in Trichy. Table 1 below shows average annual statistics obtained rom Statistics and Documentation Unit of the hospital. Table Hospitals annual statistics In the ED procedures patients are categorized according to the severity level of their medical condition. Walk-in patients are categorized as being the least emergent. Most emergent patients that require immediate care from a physician are served first, while other patients are expected to see a physician upon a number of visits to the ED, however, they have been witnessed to be waiting over an hour in the waiting room. The ED provides medical service to the patients on a 24/7 basis of two shifts. Each shift includes one main physician at examination room. The ED also contains one observation room of 16 beds. The ED service delivery process can be represented by the following set of core activities. These activities occur in a sequential manner; some of the steps are either rearranged or are omitted for different patient types. Figure 1 shows general process flowchart and patient flow through the system. Take time Analysis Take time is the important tool needed for the assessment of current system. It is a tool used to eliminate over servicing. The ED works sometimes in two shifts with 12 hours per shift, and sometimes in three shifts with 8 hours in each shift. Description Patient hospital visits to Number 130,000 19,500 6,100 5,000 35,000 The lunch breaks as well as other necessary breaks for the working staff are covered by another staff. So the total available time is 24 hours a day which equal to 86400 sec. As mentioned in Table 1, in average, the annual ED visits (flow) is 35,000, which is equivalent to 96 visits per day. So the Takt Time for each process of the service is the same and calculated below with the reference to formula mentioned in [9]. Patient Registration takt time = = 900 sec per patient Patient Evaluation takt time = = 900 sec per patient Diagnostic Tests takt time = = 900 sec per patient Therapy takt time = = 900 sec per patient Results Evaluation takt time = = 900 sec per patient After collecting the information needed with regard to the patient and information flow, it is easy now to draw the value stream map VSM for the current state. A value stream is defined as all the actions (both value added and none value added) required to bring a
Admission cases Minor operations Major operations ED visits
specific product, service or a combination of products and services, to a customer. The VSM of the ED is created by using a predefined set of icons (shown in figure 2). These icons include the process icon, the data box, the outside source, finished goods to customers icon, and information flow icon.
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) includes issues like face to face contact with the patient and physical examination. The values of the cycle times shown for each process were obtained from the relevant people and have been included as the cycle time for each stage in the service delivery process. The change over time included in the data box accounts for cleaning and preparation for the next patient. Waste time is the time the patient takes until they start serving him/her in each process. For the purpose of service control there are weekly schedules for physicians, nurses and other clinical staff. Utilizing all these concepts the VSM for the current state of the process is shown below.
The operations to be carried out per cycle are shown in Table 2 The VSM of current ED state is presented in Figure 4.3. The service commences with registration by taking patient information. Following the registration, the patient is ready to assessed by the physician. Following the assessment by the physician the patient undergoes the diagnostic testing and/or therapy, and after evaluation of the diagnostic reports and/or medical assessment, a decision to admit or discharge the patient is made. Entries in the data box underneath the process icon include entries for cycle time, change over time and waste time. As discussed previously, cycle time is the time it takes to service a patient. Cycle time From the data presented in the case, the capacity of the individual unit per hour has been calculated. This value is
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) not uniform. It indicates that there will be lot of idle time In the proposed plan, the patient valuation operation has been combined with each of diagnostic for the staff as well as in the process. It is observed that, in testing and therapy operations so that the same clinical the present system only 48 patients can be served per day. So it is decided to improve the service capability bystaff doing diagnostic tests or therapy can read the reports and evaluate the results, and then applying lean manufacturing philosophy. At this stage, the make decision either for continuing service (admission) total number of relevant clinical staff required is equal to 5. or leaving the ED (discharge). Also, another new The idle time for each operation is calculated and shown in physician was suggested to be added at the combined Table 3. The total idle time is calculated as (201600 sec). operation (Therapy and Results Evaluation). This will This idle time must be utilized in order to increase service reduce the waiting time, increase the serviceability and capability. The main objectives of applying lean principles are to reduce wastages, idle time of operators and topatient flow at the observation room, as illustrated below. increase the production/service capability by combining Next, changes that could be done to the current system some of the operations or adding new ones. These and how it will affect the overall performance and the analyses are discussed in the subsequent sections. improvement in service capability has been illustrated. These improvements mentioned above led to increasing the patients served (capacity) from 48 to 72 (50%), reducing idle time from 201600 sec to 129600 sec (36%) and increasing utilized time from 230400 sec to 302400 sec (31%), while the number of clinical staff remains the same with 5. The detailed results of these improvements are shown in Table 4. Initially diagnostic testing/therapy and patient evaluation operations are carried out by three clinical staff, which led to an idle time of 86400 sec per day. In the proposed plan after combining the operations the idle time has been reduced to 43200. In the proposed methodology, one clinical staff is engaged in the diagnostic testing and Patient Evaluation operation. So the capacity of the operation has been improved 72, but the maximum capacity of the system is 288 which is higher than the capacity of the diagnostic testing and patient evaluation operation. The difference can be eliminated by reducing waste time and change over time of operations and adding more units especially for diagnostic testing.
Diagnostic testing and/or Therapy operations, where their capacity per day is clearly low compared to other operations. The Proposed Plan
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) of the material, information and time flow, and it is used to show the level of successes and changes. The future stream map will be used to demonstrate how the product flows, eliminating all the wastes shown in the current state and achieve lean manufacturing. It is a powerful tool, it provides a clear statement of a vision for where the organization is going, however if the ideas are not implemented within a short time, it will lose its force and it will become another "has been" of manufacturing technique. The future stream map (or VSM) in this case study, which represents the future vision of the ED, is shown in Figure 9. The results achieved in the future stream map resulted from using some of the lean manufacturing; these techniques are helping the ED to be more productive, flexible, smooth and with high quality output. This will enhance the ED overall performance, lead time, changeover time and the delivery time to patients. It can be seen that the lead time of the service at ED has been reduced from 193 sec to 153 sec (20.7%).
Value Stream Mapping for Future ED State The future stream map is a method of showing the desired achieved future goals. It also shows service delivery processes after using lean tools to improve it. The future stream map is similar to the current stream map, where both give visual representation
By implementing the lean manufacturing techniques, ED at KMC Hospital for Surgery and Accidents will gain lots of benefits with regard to its healthcare services provided to the patients. Some of these advantages can be seen directly from the future stream map and can be considered as long term advantages. Table 5 below shows the differences in results between the current state and the future state.
REFERENCES
1- Fawaz Abdullah, "Lean Manufacturing Tools and techniques in the Process Industry with a Focus on Steel", PhD Thesis, University of Pittsburgh, 2003. 2- Womack, J.P. and Jones, D.T., "Lean Thinking", Simon & Schuster, London, 2003. 3- Karlsson, C., Rognes, J. and Nordgren, H., " Model for Lean Production", Institute for Management of Innovation and Technology, Goteborg, 1995. 4- Young, T., Brailsford, S., Connell, C., Davies, R., Harper, P. and Klein, J.H., Using industrial processes to improve patient care, British Medical Journal, Vol. 328 No. 7432, pp. 162-4, 2004. 5- Breyfogle, F. and Salveker, A., "Lean Six Sigma in Sickness and in Health", Smarter Solutions, Austin, TX, 2004. 6- Miller, D., "Going Lean in Health Care", Institute for Healthcare Improvement, Cambridge, MA, 2005. 7- Spear, S.J., Fixing health care from the inside, today, Harvard Business Review, Vol. 83(8), pp. 78-91, 2005. 8- E. W. Dickson, Z. Anguelov, D. Vetterick, A. Eller, and S. Singh, "Use of Lean in the Emergency Department: A Case Series of 4 Hospitals", Journal of Annals of Emergency Medicine, 2009. 9- Mikll P. Groover, "Automation, Production System and Computer Integrated Manufacturing", Second Edition, Prentice Hall, 2002. 10- Yu Cheng Wong, Kuan Yew Wong, and Anwar Ali, "A Study on Lean Manufacturing Implementation in the Malaysian Electrical and Electronics Industry", European Journal of Scientific Research, ISSN 1450216X Vol.38 No.4 (2009), pp 521-535, 2009.
CONCLUSIONS
In general, it was shown that the Value Stream Mapping is an ideal tool to expose the waste in a value stream and to identify tools for improvement. It was also illustrated that lean manufacturing tools can greatly reduce wastes identified by the current state map. The development of the future state map is not the end of a set of value stream activities. It should be stressed that the value stream should be revisited until the future becomes the present. The idea is to keep the cycle going because if sources of waste are reduced during a cycle, other wastes are uncovered in the next cycle. Lean manufacturing can thus be adapted in any manufacturing situations albeit to varying degrees. The results achieved in the proposed plan showed significant improvements in the overall performance of the ED, which allowed it to be more productive, flexible, smooth and with high quality service. These improvements include reducing the lead time of the service at ED from 193 sec to 153 sec (20.7%), which increased the patients served (capacity) from 48 to 72 (50%), reducing idle time from 201600 sec to 129600 sec (36%) and increasing utilized time from 230400 sec to 302400 sec (31%). Finally, it is concluded that an improvement in Capacity, Idle time and Utilized time has been achieved as a result of implementing lean manufacturing principles
Temporal Link Signature Measurements for Location Distinction using Mac Layer Optimization
1
S.Jeevidha 1, Mrs.M.P.Reena 2, PG Student , M.E Communication Systems, Sri Venkateswara College Of Engineering, Chennai 2 Assistant Professor, ECE, Sri Venkateswara College Of Engineering, Chennai
ABSTRACT
Location distinction is the ability of a receiver to determine the location of a transmitter, when a transmitter changes its location from one place to another. In this project, temporal link signature is used to uniquely identify the link between a transmitter and a receiver. Temporal link signature is the sum of the effects of the multipath from the transmitter to the receiver, each path have its own time delay and complex amplitude. Location estimation in wireless sensor networks is very complex process. Continuous monitor of the node locations increases the energy of wireless sensor networks. In order to reduce the usage of the energy in Wireless sensor networks, energy efficient location distinction method is proposed. This method is used to find the movement of the nodes. First the location of the node is estimated and then sensor schedules a sleep mode, after movement detection, sensor is awakened from sleep mode and find the location of the node. This is used to conserve the energy of wireless sensor networks. The location distinction algorithm reliably detects the changes of the node location in the physical channel. This detection is performed at a single receiver .Temporal link signatures are used to demonstrate that the proposed method significantly reduces the false alarm rate in comparison to the existing methods.
1. Introduction
Location distinction is the ability of a receiver to determine the location of a transmitter, when a transmitter has changes its location from one place to another. Location distinction is critical in many wireless network situations, including motion detection in wireless sensor networks, physical security of wireless objects with wireless tags, and information security against replication attacks. Wireless sensor networks Wireless sensor network (WSN) consists of spatially distributed autonomous sensors to monitor physical quantities such as temperature, sound, vibration, pressure, motion or pollutants and to cooperatively pass their data through the network to a main location Sensor location must be associated with measured sensor data and is needed in geographic location-based routing methods. Location
estimation must be done in an energy efficient manner, especially for networks of sensors with small batteries that must last for years. The energy required to estimate location must be expended when a sensor node moves, however, energy-efficient localization systems shouldnt re-estimate location unless movement actually occurs. This implies that for energy efficiency in location estimation, sensor nodes must detect motion or a change in location. Secure wireless networks Wireless networks are vulnerable to medium access control (MAC) spoofing. An adversary at a different location can claim to be another node by spoofing its MAC address. It is known as MAC address spoofing. Traditional cryptography methods are used to prevent spoofing. However these methods are susceptible to node compromise. As argued in [13], an adversary at a different location can claim to be another node by spoofing its address.. A good location distinction technique that can distinguish the location of spoofed nodes from the authentic nodes can prevent these attacks. Surprisingly, existing techniques fail to detect change in position in an energy efficient and robust manner:
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) Accelerometer Measurements: Accelerometers detect changes in velocity, and have found application in movement detection [2]. Its additional device cost could be prohibitive for applications such as barcode replacement. Further, as it would not detect motion from a sleep state, an accelerometer needs continuous power, contrary to the low-power requirements of sensor network and RFID applications. Doppler Measurements: Doppler is the frequency shift caused by the velocity of a transmitter (TX).Doppler measurements, similarly, only detect motion while the device is moving, not after it stops moving, thus transmission can not be intermittent, like a packet radio. Active RFID: Active RF tags are used to protect the physical security of objects. Radio frequency identication (RFID) tags are becoming a replacement for barcodes and means for improved logistics and security for products in stores and warehouses. Active RFID [4] is desired for its greater range, but a tag must be in range of multiple base stations in order to be able to estimate its location. Received Signal Strength (RSS) Measurements: RSS measurements contain information about a link, and are particularly useful when using multiple measurements at different receivers, e.g., the signal print of [6]. They can be used to detect movement of a TX [38]. However, in the network security application, adversaries can spoof their signal print using array or MIMO antennas which send Tracking the coordinate of a TX tag can be used as anintermediary to detect a change in location of the tag. As pointed out in [5], accuracies in typical indoor localization systems are around 3-4 m. Detection of smaller movements than the system accuracy would be difficult. In [5], RSS measurements are used directly to detect when two devices move together (e.g., with the same person) our work similarly avoids using localization as an intermediary, as a means to improve detection performance. In this paper, we propose a robust location distinction mechanism that uses a physical layer characteristic of the radio channel between Transmitter and Receiver, that we call a temporal link signature. The temporal link signature is the sum of the effects of the multiple paths from the Transmitter to the Receiver, each with its own time delay and complex amplitude. Such a signature changes when the Transmitter or Receiver changes position because the multipath in the link change with the positions of the endpoints of that radio link. Generally signatures are used to provide the security for the networks. The link signature is the signature estimated to each link of the communication networks. The following figure shows the some of the links
Receivers (j1, j2) Transmitters (i1, i2, i3) Fig 1 link signature. Consider the map of transmitters and receivers in Figure 4.1. A radio link exists between nodes at i1and j1. The Receiver of node j1 can measure and record the temporal link signature of link (i1,j1). When node i1 moves to location i3, node j1 can then distinguish the new link signature from the previously recorded link signature, and declare that it has moved. Alternatively, if an adversary impersonates the node at location i1 from location i2, the adversarys transmission to node j1 will be detected to be from a different location, and the Receiver j1 may then take a suitable action. While in either case, the detection of a link signature change can be reliably performed at one Receiver, node j2 can also participate in the detection process for higher reliability and robustness. In contrast to existing techniques, location distinction using temporal link signatures does not require continuous operationa sensor can schedule sleep, and a wireless network can send packets intermittently. When awakened from sleep or upon reception of the subsequent packet, a Receiver can detect that a neighboring Transmitter has moved since its past transmission. Unlike the RSS-based
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) technique, temporal link signatures can be measured at a single Receiver. It requires no additional complexity, which keeps tag cost and energy consumption low. The fact that a stationary Transmitter unwittingly produces a measurable and consistent temporal link signature at a Receiver is also a user privacy concern. An eavesdropper could use a temporal link signature in a similar manner to a MAC address as a handle to identify a user and monitor the activity. In applications where user privacy is important, the user should alter the devices transmission in order to produce random changes in the link signature that an eavesdropper would measure. In this project, it make the following contributions. To define the temporal link signature and propose a location distinction algorithm which makes and compares measurements of temporal link signatures at a single Receiver in order to reliably detect a change in Transmitter location. Using an extensive measurement set of over temporal link signatures in a typical office environment, This project details the tradeoff between false alarm rate and detection rate, for single and multiple receivers. This project provides an extensive comparison of temporal link signatures with an existing method which uses RSSbased signatures. This project demonstrates that for a 5percent probability of missed detection the temporal link signature method can achieve 8-16 times lower false alarm rate compared to the existing RSS-based method. Alternatively, for a 5 percent FA rate, the temporal link to 62 times lower probability of MD. For the 5 percent FA rate, the probability of MD is shown to be 0.05 percent, i.e., only one link changes in every 2,000 is not distinguished by the proposed algorithm, when three receivers collaborate. Finally, evaluate the temporal and spatial statistics of temporal link signatures and show that the data is nonGaussian, typically heavy tailed. This project provides a means to estimate the mutual information between a measured link signature and the link. This Mutual information quantifies the uncertainty about the link that the link signature measurement removes. signatures and also develop a methodology to evaluate it. Finally, we describe our methodology for evaluating RSS-only signatures, which provides a comparison of our work to existing work. 2.1 Temporal Link Signature The power of the temporal link signature comes from the variability in the multiple paths over which radio waves propagate on a link. A single radio link is composed of many paths from the TX to the RX. These multiple paths (multipath) are caused by the reflections, diffractions, and scattering of the radio waves interacting with the physical environment. Each path has a different length, so a wave propagating along that path takes a different amount of time to arrive at the RX. Each path has attenuation caused by path losses and interactions with objects in the environment, so each wave undergoes a different attenuation and phase shift. At the RX, many copies of the transmitted signal arrive, but each copy arriving at a different time delay, and with a different amplitude and phase. The sum of these time delayed, scaled, and phase shifted transmitted signals is the received signal. Since the received signal is a linear combination of the transmitted signal, we can consider the radio channel or a link as a linear filter. For the link or channel in between TX I and RX j, the channel impulse response (CIR), denoted hi,j( )is given by
Essentially, the filter impulse response is the superposition of many impulses, each one representing a single path in the multiple paths of a link. Each impulse is delayed by the path delay, and multiplied by the amplitude and phase of that path. The received signal, r(t), is then the convolution of the channel filter and the transmitted signal s(t),
2 METHODOLOGY
We first define a temporal link signature and highlight the strong dependence of the link signature on the multipath radio channel. Next, we describe how it can be measured in typical digital receivers. We then describe a location distinction algorithm, that is, based on our link All receivers measure r(t) in order to demodulate the information bits sent by the TX. In this paper, we additionally use r(t) to make a band-limited estimate of j
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) hi;j(). We call this (noisy) estimate the temporal link signature. 2.1.1 Temporal Link Signature Estimation If the SNR of r(t) is high enough so that bits are correctly demodulated, then, the s(t) transmitted signal, can be recreated in the RX. Even in resource constrained scenarios, when packets can be successfully decoded by a less computationally constrained RX, the original waveform s(t) can be reconstructed, thereby facilitating our methods. In general, estimating hi;j(t) from known r(t), and s(t) in (2) is a de convolution problem, but for a number of reasons, we do not actually need to perform a de convolution. As a result, we estimate the temporal link signature using only convolution, rather than de convolution. To show this, we first rewrite (2) in the frequency domain as The calculation of temporal link signature can be done regardless of modulation, but for particular modulation types, the process is even easier. This paper does not develop new channel state estimation methods; our techniques are advantageous because they can exploit already existing channel state estimation algorithms, like For example, consider receivers for orthogonal frequency division multiplexing (OFDM)-based standards, such as in IEEE 802.11a/g and 802.16. Such receivers can be readily adapted to calculate temporal link signatures since the signal amplitude and phase in each sub channel provides a sampled version of the Fourier transform of the signal. In effect, the Fourier transform operation is already implemented, and R(f) is directly available. Use of R(f) directly in an OFDM-like system has been evaluated. In our work, calculation of the temporal link signature requires an additional inverse FFT operation. Most of the calculation necessary for the computation of temporal link signatures is already performed in existing code-division multiple access (CDMA) cellular base station receives, and in access points for WLANs operating on the 802.11b standard. CDMA receivers first correlate the received signal with the known pseudo noise (PN) signal. They then use the correlator output in a rake receiver, which adds in the power from several multipath components. Our temporal link signature in (2) is the correlation of the received signal with the transmitted signal s(t). In contrast, the PN correlator correlates only with one period of a PN signal, which is one symbol duration of s(t). The link signature of (2) can be estimated by averaging the PN correlator output over the course of many symbols. This would represent little additional calculation compared to the PN correlation operation. An implementation of this method for 802.11b signals using the universal software radio peripheral (USRP) receiver has been implemented and is available for download [1]. 2.2 Normalization Two types of normalization are important when measuring link signatures. They are time delay, and amplitude. One problem exists when describing time measurements are transmitters and receivers are typically not synchronized. Thus, the temporal link signature, hi,j(t)^n has only a relative notion of time t. For example, consider the case that the next temporal
Where R(f),S(f),Hi;j(f) are the Fourier transforms of r(t),s(t); hi;j(t) respectively. Then, we multiply R(f) with the complex conjugate of the Fourier transform of the recreated transmitted signal, s*(f)
Note that this multiplication in the frequency domain is a correlation in the time domain. As s(f)^2 is nearly constant within the band, (3) is a band limited version of Hi;j(f) plus noise. Finally, the temporal domain is recovered from (3) by taking the inverse Fourier transform. We denote the impulse response estimate from TX i at RX j as hi;j(t),
where F-1 indicates the inverse Fourier transform. If Ps is known at the RX, for example, using 802.11h [31], it can be readily removed as given in (4). If Ps fluctuates due to transmit power changes that are unknown at the RX, the system would normalize the amplitude of all hi;j(t) estimates. 2.1.2 Modulation-Dependent Implementations
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) link signature on the same link (i,j), hi,j(t)^n+1 is equal to hi,j(t+t)^n where t is a timing offset error. If the timing error is not removed in some way, the temporal link difference between the nth and n+ 1st measurement would be very high, simply because of the lack of synchronization. Hence, we normalize the time delay axis at each new link signature measurement by setting the time delay of the line of sight multipath to be zero. All link signatures in this project are time-delaynormalized. For purposes of robustness to replication attacks, consider that an attacker has the capability to increase or decrease the transmit power of the device. One way to mitigate this attack is to amplitude normalize each measured CIR, that is, eliminate any effect of the transmit power on the measurement. To discuss the option of amplitude normalization of the measured impulse response, to form the normalized link signature. 2.3 Algorithm The location distinction algorithm using temporal link signature is as follows, STEP 1: A history of most recent N-1 link signature is measured and stored. In this paper, we also explore the use of multiple receivers to make the use of link signatures extremely robust, as displayed in Fig. 1. This relies on collaboration between two or more nodes. In sensor networks, collaboration should be largely avoided in order to reduce communication energy, but it may be used in a small fraction of cases in order to confirm with higher reliability that a TXs location has changed. Sensor and ad hoc networks typically rely on redundancy of links, so each node is expected to have multiple neighbors. For prevention of replication attacks, collaboration may be normal, and any access points in radio range would collaborate. WLAN coverage regions often overlap, and hence, multiple access points may receive signals from the same Transmitter. As WLANs become more ubiquitous, access point densities may increase and we would expect more overlap. We define the set J to be the set of receivers involved in the collaborative location distinction algorithm for Transmitter i. The algorithm proceeds as described in Section 2.3 with each Receiver calculating di,j but after step 2, nodes jJ send differences di,j to a central processor (which could be any jJ).The central processor combines the results into a mean distance di,J di,J = 1/Jdi,j These histories are measured while the transmitter is not moving and not under the replication attack, till hi,j^(n) differs due to normal variations in the radio channel and measurement noise. STEP 2: The Nth measurement H(N) is then taken, H(N) denotes the Nth measurement of the temporal link signature. STEP 3: Compute the difference between H(N) and H D= H - H (N); STEP 4: Next Compare D with threshold If D> threshold means, location change detected. else D< threshold means, replace the oldest measurement with D, go to step 2. By using the above calculated difference D value, movement of the node is detected. 2.4 Multiple Receiver Link Differences
Afterwards, steps 3 and 4 of the algorithm proceed using di,J in place of the single RX distances di,j. we want to determine the false alarm rate for both single and multiple receivers. False alarm rate is the ratio of the number packets dropped to the number of packets sent in the wireless sensor network. In this project we reduced the false alarm rate in wireless sensor networks in comparison to the existing methods by using the temporal link signature measurements.
3. EXPERIMENTAL VERIFICATION
Our proposed location distinction method relies heavily on the variability of the link signature, both in space and in time. Thus, accurate performance evaluation is done by using a set of link signature measurements recorded in a large network over time. We describe the measurement set and the evaluation results in this section. Different data from the described measurement
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) campaign has been previously reported to evaluate radio localization algorithms. 3.1 Environment and System The measured environment is in a typical modern office area. There are 44 device locations, shown in Fig. 3, within a 14 m by 13 m rectangular area. The campaign measures the channel between each pair of the 44 device locations, one at a time. All 44 _ 43 1;892 TX and RX permutations are measured. At each permutation of TX and RX locations, the RX measures N 5 link signatures, over a period of about 30 s. The nth normalized measurement on link I,j is denoted hi,j(t)^n , for n 1; . . . ; 5. A total of 44 _ 43 _ 5 9460 measurements are recorded. The measurement system is comprised of a direct sequence spread spectrum (DS-SS) TX and RX. The system transmits and receives an un modulated pseudo noise (PN) code signal with a 40 MHz chip rate at center frequency is
4. RESULTS
Figure 3 wireless sensor network
Figure 2 Measurement area map including device locations. 2,443 MHz. Both TX and RX have sleeve dipole antennas at 1 m height above the floor. The RX is essentially a software radio which records I and Q samples at a rate of 120 MHz. The FFTof the received signal,R(f), is multiplied by the conjugate of the known transmitted signal spectrum, S*(f) as described in (3). Then, the IFFT is taken to calculate hi,j(t)^n . About 1 percent of the time, a link signature has a very low signal-to-noise ratio (SNR) due to interference. Whenever a high noise floor is measured for a link, that measurement is dropped, thus some links have N <5 measurements. All results have considered the actual N of each link (i,j).
Figure 4 location distinction graph The above graph indicates the movement of the wireless sensor node.
Figure 5 Graph for False alarm rate versus no of receivers.
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) Figure 6: path length versus no of receivers The above graph shows that this method significantly reduces the false alarm rate in comparison to the existing methods. Table 1 false alarm rates Method Single Multi receiver receivers 0.0655 0.011 signature measurements are used to uniquely identify the link between the transmitter and receiver and also it is used to find the movement of the transmitter. This detection can be performed by using single receiver. Finally, this method significantly reduces the false alarm rate in comparison to the existing methods. In future cooperative location distinction algorithm will be implemented, to achieve the higher reliability by using multiple receivers. The measurement set utilized in this project covered relatively short path lengths. In general, wider bandwidths and longer path lengths generate a richer link signature space and make measured link signatures more unique as a function of transmitter and receiver locations. The tradeoff between bandwidth, path length, and detection performance must be characterized in future work. REFERNCES 0.345 1. Agarwal.S Krishnamurthy.S.V ,Katz R.H., and Dao.S.K, Distributed Power Control in AdHoc Wireless Networks, Proc. IEEE Intl Symp. Personal, Indoor and Mobile Radio Comm. (PIMRC 01), pp. 59-66. 2. Burchfield.T and Venkatesan.S, Accelerometer based human abnormal movement detection in Wireless Sensor Networks, proc.first ACM intl conf. Mobile Ad Hoc. and Sensor Systems (MASS 08), PP.315320,2008. 3. Chandra.R, Ramasubramanian.V, and Birman.K.P, Anonymous Gossip: Improving Multicast Reliability in Mobile Ad-Hoc Networks, Proc. IEEE Intl Conf. Distributed Computing Systems (ICDCS 01), pp. 275-283. 4. Chandrasekaran.I.G ,Ergin,S ,Mruteser,R. Martin,J. Yang.R DECODE:Detecting Comoving wireless Devices,Proc.Fifth IEEE Intl Conf.Mobile Ad Hoc and sensor systems. 5. IEEE Std 802.11-1999, Local and Metropolitan Area Network, Specific Requirements, Part 11: 6. Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) Specifications, IEEE.f 7. Faria.D,B and Cheriton.D.R , Sept. 2006. Radio-layer security:Detecting identitybased
Link signature
Normalized 0.1052 Link Signature RSS Wide 0.5164 band

RSS Narrow
0.087
band
0.6676
0.456
Table 1 compares the four methods, link signatures (LS), normalized link signatures (NLS), wideband received signal strength (RSS WB), and narrowband received signal strength (RSS NB), given a system design requirement to have a 95 percent detection rate (or 5 percent probability of miss). With one RX, link signatures could achieve this requirement with a 6.55 percent probability of false alarm. In comparison, RSS WB methods require a 51.64 percent and 2.95 percent probability of false alarm for one e receivers. In relative terms, the false alarm rate is eight and 16 times higher for the RSS WB case.
5 CONCLUSIONS AND FUTURE WORK

Energy efficient location distinction in wireless sensor networks is achieved by the proposed location distinction algorithm, which reliably detects the changes of node in the physical channel. The temporal link
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) attacks in wireless networks using signal prints. In WiSe06, pages 4352. 8. Kesara.K, Nealpatwari IEEE MARCH,2011 Temporal link signature Measurements for location distinction proc.intlsymp 9. Smith,K.P. Jiang,A .Mamishev.K ,M.Philipose, and Vol.48,Sundara-Rajan.K, RFID- Based techniques for human activity detection,Comm.ACM, no.9, pp.39-44,2005. 10. Xu.S and Saadawi.T, Does the IEEE 802.11 Jun. 2001.MAC protocol work well in multihop wireless ad hoc networks? IEEE Commun. Mag., vol. 39, n 11. VINT Project,The UCB/LBNL/VINT Network Simulator-ns (Version2), http://www.isi.edu/nsnam/ns. Web Sites http://www.ieee.org http://www.almaden.ibm.com/software/quest/ Resources/ http://www.computer.org/publications/dlib http://www.ceur-ws.org/Vol-90/
An Effective Text Clustering and Retrieval using SAP

Radha.R 1 Tina Esther Trueman 2 T.T. Mirnalinee 3
1,2 3
ABSTRACT
PG Student ,Department of CSE, SSN College of Engineering, Chennai. Professor, Department of CSE, SSN College of Engineering, Chennai.
Clustering is used to improve the efficiency and effectiveness of the retrieval process in text mining and information retrieval systems. Document Clustering is achieved by partitioning the documents in a collection into classes such that documents that are associated with each other are assigned to the same cluster. In this paper, we describe a new feature extraction method called Tri-Set Computation which involves Co-feature set, Unilateral feature set and Significant Co-feature set for extracting the feature vectors. The semi supervised clustering method applied to the feature vectors is Seeds Affinity Propagation (SAP) which improves the accuracy and reduces the computational complexity. Here we are using a similarity metric that captures the structural information of texts. The clustering result is based on the object representation, the similarity measure and the clustering algorithm. Finally we estimate the document similarity between clustered documents by finding out the distance measure. Similarity is measured according to the expectation of the same words appearing in two documents. Key Words Clustering, Tri-Set Computation, Co-feature set, Unilateral Feature set and Significant Cofeature set, Seeds Affinity Propagation (SAP).
I. INTRODUCTION
Clustering algorithms group a set of documents into subsets or clusters. The text clustering has become one of the most active areas of research and the development. One of the challenging problems in text clustering that attempts to discover the set of meaningful groups of documents where those within each group are more closely related to one another than documents assigned to different groups. The resultant document clusters can provide a structure for organizing large bodies of text for efficient browsing. Text clustering referred to as document clustering is closely related to concept of data clustering. The process of clustering aims to discover natural groupings, and thus present an overview of the classes in a collection of documents. A good clustering can be viewed as one that organizes a collection into groups such that the documents with in
each group are both similar to each other and dissimilar to those in other groups. The clustering algorithms goal is to create clusters that are coherent internally, but clearly different from each other. In other words, documents within a cluster should be as similar as possible; and documents in one cluster should be as dissimilar as possible from documents in other clusters. A web search engine often returns thousands of pages in response to a broad query, making it difficult for users to browse or to identify relevant information. Clustering methods can be used to automatically group the retrieved documents into a list of meaningful categories. The first challenge in a clustering problem is to determine which features of a document are to be considered discriminatory. A majority of existing clustering approaches choose to represent each document as a vector, therefore reducing a document to a representation
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) suitable for traditional data clustering approaches. Clustering is a useful technique that organizes a large quantity of unordered text documents into a small number of meaningful and coherent clusters, thereby providing a basis for intuitive and informative navigation and browsing mechanisms. Partition clustering algorithms have been recognized to be more suitable as opposed to the hierarchical clustering schemes for processing large datasets. A wide variety of distance functions and similarity measures have been used for clustering, such as squared Euclidean distance, cosine similarity, and relative entropy. There are many algorithms for automatic clustering like the K Means algorithm, Expectation Maximization and hierarchical clustering which can be applied to a set of vectors to form the clusters. Traditionally the document is represented by the frequency of the words that make up the document (the Vector space model and the Self organizing semantic map). Different words are then given importance according to different criteria like Inverse Document frequency and Information Gain. Seeds Affinity propagation clustering (AP) is a fast clustering algorithm especially in the case of large number of clusters, and has some advantages: speed, general applicability and good performance. SAP works based on similarities between pairs of data points (or nn similarity matrix S for n data points), and simultaneously considers all the data points as potential cluster centers (called exemplars). To find appropriate exemplars, SAP accumulates evidence responsibility R(i,k) from data point i for how well-suited point k is to serve as the exemplar for point i, and accumulates evidence availability A(i,k) from candidate exemplar point k for how appropriate it would be for point i to choose point k as its exemplar. From the view of evidence, larger the R(:,k)+A(:,k), more probability the point k as a final cluster center. Based on evidence accumulation, SAP searches for clusters through an iterative process until a high-quality set of exemplars and corresponding clusters emerges. In the iterative process, identified exemplars start from the maximum n exemplars to fewer exemplars until m exemplars appear and are unchanging any more (or SAP algorithm converges). The m clusters found based on m exemplars are the clustering solution of SAP. A semi-supervised document clustering algorithm, called Seeds Affinity Propagation (SAP) is applied based on an effective clustering algorithm Affinity Propagation (AP). Based on the initial text document representation, we have first applied stop word removal. Stop words are the words which are considered as non-descriptive within a bag of words approach. They typically comprise prepositions, articles, etc. We removed the stop words from the documents. Then we performed stemming process in all different combinations. And then find out the word frequency within the documents. After preprocessing labeled and unlabeled documents, structured documents are obtained. Tri-set Computation, a feature extraction technique is used to find out the features through Co-feature set, unilateral feature set and Significant Co-feature set methods. Then calculate the similarity measure of the documents and assigning the label to the documents if they are matched. Finally clustered documents are obtained through seeds affinity propagation via similarity measurement. The result shows that Seeds Affinity propagation gives the result of data cluster more accurate and effective than KMeans, it can be seen from the table. This paper is organized as follows: Section II provides brief information about related works. Section III explains the proposed system architecture in detail. Section IV deals with the comparative study of various clustering algorithms. Section V describes the predictive modeling of our work. Section VI provides the analysis of overall performance. Finally section VII concludes the proposed work and gives directions for future use.
II. RELATED WORK
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) Renchu Guan [1] et al. proposed a novel semi supervised text clustering algorithm, called Seeds Affinity Propagation (SAP). There are two main contributions in our approach: 1) a new similarity metric that captures the structural information of texts, and 2) a novel seed construction method to improve the semi supervised clustering process. To study the performance of the new algorithm, they applied it to the benchmark data set Reuters-21578 and compared it to two state-of-the-art clustering algorithms, namely, k-means algorithm and the original AP algorithm. Y.J. Li [2] et al. proposed a new supervised feature selection method, named CHIR, which is based on the chi2 statistic and new statistical data that can measure the positive term-category dependency. They also proposed a new text clustering algorithm, named text clustering with feature selection (TCFS). TCFS can incorporate CHIR to identify relevant features (i.e., terms) iteratively, and the clustering becomes a learning process. They compared TCFS and the K-means clustering algorithm in combination with different feature selection methods for various real data sets. Z.H. Zhou [4] et al. proposed a cotraining style semi-supervised regression algorithm, i.e. COREG. This algorithm uses two regressors each labels the unlabeled data for the other regressor, where the confidence in labeling an unlabeled example is estimated through the amount of reduction in mean square error over the labeled neighborhood of that example. Analysis and experiments show that COREG can effectively exploit unlabeled data to improve regression estimates. M. Belkin [7] et al. developed an algorithmic framework to classify a partially labeled data set in a principled manner. The central idea of their approach is that classification functions are naturally defined only on the sub manifold in question rather than the total ambient space. We designed the semi supervised document clustering method using Seeds Affinity Propagation. Figure 1 shows the overall architectural diagram of our proposed system. Labeled and Unlabeled Documents
Preprocessing
Stop Words Removal Word Stemming Word Frequency
Structured Document Tri- Set Computation Similarity Measure Matching Add Label Clustered Documents In this paper, a semi-supervised document clustering algorithm, called Seeds Affinity Propagation (SAP) is applied based on an effective clustering algorithm Affinity Propagation (AP). Based on the initial text document representation, we have first applied stop word removal. Stop words are the words which are considered as non-descriptive within a bag of words approach. They typically comprise prepositions, articles, etc. We removed the stop words from the documents. Then we performed stemming process in all Non-Matching
III. PROPOSED WORK
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) different combinations. And then find out the word frequency within the documents. After preprocessing labeled and unlabeled documents, structured documents are obtained. Tri-set Computation, a feature extraction technique is used to find out the features through Co-feature set, unilateral feature set and Significant Co-feature set methods. Then calculate the similarity measure of the documents and assigning the label to the documents if they are matched. Finally clustered documents are obtained through seeds affinity propagation via similarity measurement. The result shows that Seeds Affinity propagation gives the result of data cluster more accurate and effective than KMeans, it can be seen from the table. A. Seeds Affinity Propagation (SAP) The features of this new algorithm are tri-set computation, similarity computation and seed construction. The three feature sets are Co feature set, Unilateral Co feature set and Significant Co feature set. The Co-feature set CFS (i,j) is derived from the intersection of two objects X any Y. The Unilateral Feature set UFS (i,j) is derived only the unshared features and finally the Significant Co-feature set SCS(i,j) is derived the most significant features. In our paper, we consider all the words in the title of the document except stop words are most documents are extracted through Tri-set computation based on Co feature set, Unilateral Co feature set and Significant Co feature set. Then calculate the similarity measure of the documents and assigning the label if they matched. Finally clustered documents are obtained. Similarity measurement is the vital role in seeds affinity propagation. The tri-set similarity is calculated by using the formula,
s (i , j ) n m n q
m 1 q 1
|CFS |
| SCS |
|UFS | p 1
. (2)
Where the parameters , , are the adaptive factors. CFS (i,j) is derived from the intersection of two objects. UFS (i,j) is derived the unshared features. SCS (i,j) is derived the most significant features. The below Table 1 describes the detailed comparative analysis between SAP (Seeds Affinity Propagation) and K-Means. SAP Superior performance Can be used in sparse datasets This algorithm automatically sets its numbers of cluster. Higher F-measure (ca. 40 percent improvement over kmeans Lower entropy (ca. 28 percent decrease over k-means) Improves clustering execution time (20 times faster over kmeans) K Means Less performance comparatively Cannot used in sparse datasets This algorithm requires the number of cluster to be specified Lower F-measure
IV.COMPARATIVE STUDY
significant features. Mean Features Selection is the method we are using here to construct the seeds. In our method each term in the text is referred as feature and each document is referred as vector. At a time only one is computed. Spontaneously we cannot compute these two. The below Figure 2 describes the detailed process of document clustering. The labeled and unlabeled documents are preprocessed through various processes such as stop words removal, word stemming and finding out the word frequency. After these processes structured documents are obtained. Then the features of the structured
Somewhat entropy
higher
Takes much execution time
Computational time is Computational low complexity grows exponentially Faster Convergence Comparatively less
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) and better clustering convergence results Table 1: Performance comparison between SAP and K- Means The below Table 2 represents the entropy values for the five methods namely SAP, SAP(CC), AP(TRI-SET), AP(CC), KMEANS. The proposed semi supervised strategy achieves both better clustering results and faster convergence (using only 76 percent iterations of the original AP). The complete SAP algorithm obtains higher F-measure (ca. 40 percent improvement over k-means and AP) and lower entropy (ca. 28 percent decrease over k-means and AP), improves significantly clustering execution time (20 times faster) in respect that k-means algorithm.
Figure 2: Retrieval phase for the user The features of the structured documents are extracted through Tri-set computation based on Co feature set, Unilateral Co feature set and Significant Co feature set. Then calculate the similarity measure of the documents and assigning the label if they matched. Finally clustered documents are obtained.
Algorith ms SAP SAP(CC) AP(TRI SET) AP(CC) K MEANS
Mean F- Mean Measure Entropy 0.599 0.531 0.486 0.403 0.416 0.472 0.592 0.61 0.657 0.658
Mean CPU Execution time 14.3 13.3 14.1 12.6 345.9
Table 2: Entropy values for the five methods Here we used structural information so that it significantly improves the clustering and convergence results. In comparison with Kmeans, SAP improves accuracy and reduces the computational complexity.
V. PREDICTIVE MODELING
The effective document clustering and retrieval using seeds affinity propagation and fuzzy seed set is shown below.
Figure 3: Similarity Measure for labeled and unlabeled documents Here calculate the similarity measure between the labeled and unlabeled documents and assigning the label if they matched. Finally
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) clustered documents are obtained through semi supervised document clustering process. SAP(Cosine Co-efficient), SAP provides lower entropy value which improves the convergence results. That is number of iterations required to compute the results are dramatically reduced.
VI. PERFORMANCE ANALYSIS

Results show that the similarity metric is more effective in document clustering (Fmeasures ca. 21 percent higher than in the AP algorithm) and the proposed semi supervised strategy achieves both better clustering results and faster convergence (using only 76 percent iterations of the original AP). The complete SAP algorithm obtains higher F-measure (ca. 40 percent improvement over k-means and AP) and lower entropy (ca. 28 percent decrease over k-means and AP), improves significantly clustering execution time (20 times faster) in respect that k-means, and provides enhanced robustness compared with all other methods.
Figure 6: Improves clustering execution time (20 times faster) over k-means The above Figure 6 represents that SAP improves clustering CPU execution time 20 times faster than other algorithms.
VII. CONCLUSION
Figure 4: SAP algorithm obtains higher Fmeasure (ca. 40 percent improvement over kmeans and AP) In this paper we applied seeds affinity propagation for semi supervised (i.e. tiny amount of labeled data and more amount of unlabeled data) document clustering on the basis of Co-feature set, Unilateral feature set and Significant Co-feature set. Here we used structural documents like scientific papers (IEEE) and news articles so that it significantly improves the clustering and convergence results. In comparison with K-means, AP (CC), SAP (CC), AP (Tri-set), SAP improves accuracy and reduces the computational complexity. Our experimental result shows that the entire SAP algorithm provides higher FMeasure and faster CPU execution time and lower entropy value. And this algorithm is suitable for all the domains deals clustering related problem. Our future work is to estimate the similarity between the clustered documents by finding out the distance measure and assign
Figure 5: SAP algorithm obtains lower entropy (ca. 28 percent decrease over k-means and AP) The above Figure 5 describes, When compared with all the algorithms like K-means, AP (Cosine Co-efficient), AP(Tri-set),
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) the respective ranking to all the target documents in order to improve the accuracy. Training, Proc. 11th Ann. Conf. Computational Learning Theory, pp. 92100, 1998. [6] S. Yu, B. Krishnapuram, R. Rosales, H. Steck, and R.B. Rao, Bayesian CoTraining, Advances in Neural Information Processing Systems, vol. 20, pp. 1665-1672, MIT Press, 2008. [7] M. Belkin and P. Niyogi, SemiSupervised Learning on Riemannian Manifolds, Machine Learning, vol. 56, pp. 209-239, 2004.
REFERENCES
[1] Renchu Guan, Xiaohu Shi, Maurizio Marchese, Chen Yang, and Yanchun Liang, Text Clustering with Seeds Affinity Propagation, IEEE Transactions On Knowledge And Data Engineering, Vol. 23, no.4, April 2011. [2] Y.J. Li, C. Luo, and S.M. Chung, Text Clustering with Feature Selection by Using Statistical Data, IEEE Trans. Knowledge and Data Eng., vol. 20, no. 5, pp. 641-652, May 2008. [3] G. Salton and M.J. McGill, Introduction to Modern Information Retrieval. McGraw Hill Book Co., 1983. [4] Z.H. Zhou and M. Li, Semi-Supervised Regression with Co-Training Style Algorithms, IEEE Trans. Knowledge and Data Eng.,vol. 19, no. 11, pp. 14791493, Aug. 2007. [5] A. Blum and T. Mitchell, Combining Labeled and Unlabeled Data with Co
Link based routing in suspicious MANETs

1
S. Newlin Rajkumar1, V.Sheela Devi 2. Asst.Professor, Department of CSE , AnnaUniversityofTechnology, Coimbatore 2 PG Student, Department of CSE , AnnaUniversityofTechnology, Coimbatore
Abstract
In most common mobile ad hoc networking (MANET) scenarios, nodes establish communication based on long-lasting public identities. However, in some hostile and suspicious settings, node identities must not be exposed and node movements should be untraceable. Instead, nodes need to communicate on the basis of their current locations. While such MANET settings are not very common, they do occur in military and law enforcement domains and require high security and privacy guarantees. In this paper, we address a number of issues arising in suspicious location-based MANET settings by designing and analyzing a privacy-preserving and secure link-state based routing protocol (ALARM). ALARM uses nodes current locations to securely disseminate and construct topology snapshots and forward data. With the aid of advanced cryptographic techniques (e.g., group signatures), ALARM provides both security and privacy features, including node authentication, data integrity, anonymity, and untraceability (tracking-resistance). It also offers protection against passive and active insider and outsider attacks. To the best of our knowledge, this work represents the first comprehensive study of security, privacy, and performance tradeoffs in the context of link-state MANET routing. Keywords Privacy, communication system security, communication system routing, mobile communication, location-based communication, military communication.
INTRODUCTION
Mobile ad hoc networks (MANETs) has been very active, motivated mainly by military, disaster relief, and law enforcement scenarios. More recently, location information has become increasingly available through small and inexpensive GPS receivers, partially prompted by the trend of introducing location-sensing capabilities into personal handheld devices. A natural evolutionary step is to adopt such locationbased operation to MANETS. In such a MANET, devices rely on location information in their operation. The main distinguishing feature of the envisaged location-based MANET environment is the communication paradigm, based not on permanent or semi-permanent identities, addresses or pseudonyms, but on instantaneous node location. In other words, a node (A) decides to communicate to another node (B), depending on
exactly where (B) is located at present. If node location information is sufficiently granular, a physical MANET map can be constructed and node locationsinstead of persistent node identitiescan be used in place of network addresses. In this paper, we consider what it takes to provide privacy-preserving secure communication in hostile and suspicious MANETS. We construct a protocol for Anonymous Location-Aided Routing in MANETS (ALARM) that demonstrates the feasibility of simultaneously obtaining, strong privacy, and security properties, with reasonable efficiency. In this context, privacy means node anonymity and resistance to tracking. Whereas, security includes node/origin authentication and location integrity. Although it might seem that our security and privacy properties contradict each other, we show that some advanced cryptographic techniques can be used to reconcile them.
II. METRICS AND METHODS
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) A Link state routing Link-state (LS) routing protocols, such as OLSR [28]. One advantage of LS protocols is that, unlike their reactive counterparts, they obviate the need for route discovery. This makes LS protocols suitable for real-time applications that impose strict delay constraints. On the other hand, LS protocols do not scale well due to excessive broadcastingn updates flooded throughout the MANET for each update period. However, this has been mitigated in OLSR by reducing the number of nodes that forward routing control messages to a subset of the first hop neighbours of any node, called multipoint relays (MPRs). In addition, since our goal is to accommodate relatively modest-sized MANETs (on the order of tens or few hundreds of nodes), scalability can be easily achieved. Furthermore, LS allows us to achieve stronger security, since origin authentication and integrity of LS updates can be easily supported. There are a number of well known techniques that achieve this, e.g., [1] and [2], [3]. The main challenge arises from the need to reconcile security and privacy (anonymity and untraceability) requirements that we address below. Based on the above discussion, we consider link-state to be best-suited for supporting locationbased routing with the privacy and security features described earlier. In the rest of this paper, we use a simple flooding-based scheme to illustrate the operation of LAR. However, we note that any optimization for reducing LS flooding overhead (e.g., MPR based flooding in OLSR), can be easily integrated into LAR. B Group signatures Group signatures can be viewed as traditional public key Signatures but with additional privacy features. In a group signature scheme, any member of a large and dynamic group Can sign a message, thereby producing a group signature. A group signature can be verified by anyone who has a copy of a constant-size group public key. A valid group signature implies that the signer is a genuine group member. At the same time, given two valid group signatures, it is computationally infeasible to decide whether they are generated by the same (or different) group members. Furthermore, in case of a dispute over a group signature, a special entity called a Group Manager (GM) can open a group signature and identify the actual signer. This important feature is called Escrowed Anonymity. Based on the above, it seems that group signatures are a perfect fit for our envisaged MANET setting. A mobile node can periodically sign its current location (link-state) information without fear of being tracked, since multiple group signatures are not linkable. At the same time, anyone can verify a group signature and be assured that the signer is a legitimate MANET node.
IVASSUMPTIONS
Location: Universal information: availability of location
Each node is equipped with a device that provides scheme ECDSA192 ECDSA256 GSIG GSIG1 Security level(bits) 80 120 80 128 Sign(sec) 5. 8. 1.7. 5.37. Verify(sec) 3. 4.2. 1.56. 4.93. Signature size(bytes) 48 64 151 225
accurate positioning information, e.g., GPS. Mobility: Sufficiently high mobility: A certain minimum fraction (or number) of nodes move periodically, such that tracking a given mobile node from one topology snapshot to the next requires distinguishing it among all nodes that have moved in the interim. Time: All nodes maintain loosely synchronized clocks. This is easily obtainable with GPS.
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) Range: Nodes have uniform transmission range. Once a node knows the current MANET map, it can determine node connectivity (i.e., transform a map into a graph). TABLE 1 Computation Costs, Signature, and Key Size for a Group Signature (GSIG) and EC-DSA Table 1 shows timings for group signature generation and verification, compared to standard Elliptic Curve DSA (EC-DSA) measured using OpenSSL [4].Measurements are reported as in [5]. They were obtained on a 1.5 GHz Centrino processor. III LAR PROTOCOL Operation a. Time is divided into equal slots of duration T. At the beginning of each slot, each node s generates a temporary public-private key-pair: PKTMPs and SK-TMPs, respectively. PK-TMPs is subsequently used by other nodes to encrypt session keys to establish secure channels with s. Note that these keys can be generated offline. b. Each node broadcasts a Location Announcement Message (LAM), containing its location (GPS coordinates), time-stamp, temporary public key (PK-TMPs), and a group signature computed over these fields. c. Upon receipt of a new LAM, a node first checks that it has not received the same LAM before; it then verifies the time-stamp and group signature. If both are valid, the node rebroadcasts the LAM to its neighbours. Having collected all current LAMs, each node constructs a geographical map of the MANET and a corresponding node connectivity graph. A flowchart describing this sequence of steps is shown in Fig. 2. Between successive LAMs, a node can be reached (addressed) using a temporary pseudonym formed as current location concatenated with the group signature in the last LAM. d. Whenever a node desires to communicate with a certain location, it checks to see if any node currently exists at (or near) that location. If so, it Organized by: Department of Computer Science and Engineering , Anand Institute of Higher Technology, Chennai. www.iconic12.in E-mail: iconic.aiht@gmail.com Page 30 Fig 1LAR sender process Message-Type=LAM Current-Location(8 bytes) (Location) Current-Time-Stamp(4bytes) (TS) Temporary-Cryptography key(-128bytes) (PK-TMP) Optional:Transmission/ReceptionRange(2bytes) (TX-RX-RNG) Optional: Public-Key-Signature(-250bytes) GSig (Location||TS||PK-TMP)(-200bytes) Fig.4 LAM message format sends a message to the destinations current pseudonym (TmpID). This message is encrypted with a session key using a symmetric cipher. The session key is, in turn, encrypted under the current public key (PK-TMP) included in the destinations latest LAM. When the destination receives the message, it first recovers the session key and uses it to decrypt the rest.
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) but sequential aggregate signatures (SAS) that are constant in size. A similar approach has been used to secure route discovery in the DSR routing protocol in [6]. One such SAS construct is based on RSA [7] and its signature generation cost is equivalent to a plain RSA signature. Verification cost, on the other hand, increases linearly with number of signers (nodes) on the path. However, this cost can be minimized by using small public exponents. 1. Assume that a nodes i th private key is SK= and its public key consists of the pair ( ), where =1(mod ). Fig 2 Receiver process 2. During operation, if the it h signature then is set to 1; otherwise set to 0.During verification phase; if =1 then is added to before proceeding with the verification of . 3. Assume that node A sends a LAM through nodes B and C to reach D, the signing procedure is as follows: a. A: computes =H (LAM, ( )) and =( )(mod ). is then added to the LAM. b: If Fig 3 Communication decision flow set and =1, else
chart
=0 computes =H(LAM,( )) and =( + ) (mod ). is then added to the LAM instead of . c: If =0 =( set and =1, else computes =H(LAM,( )) and ) (mod ). is then added to the LAM instead of .
Sequential Aggregate Signatures (SAS) This extension leverages the fact that each node already includes a temporary public key in its LAM. A node first sends its own LAM before forwarding LAMs of other nodes. A node can use its private key to sign other forwarded LAMs. Such signatures can be aggregated (e.g., Sequential Aggregate Signatures) to maintain a constant size LAM. An adversary launching an active attack (by generating phantom nodes, impersonating other nodes and/or lying about its location) will be detected due to mismatching signatures in received LAMs. Note that these are not group signatures,
d: computes =H (LAM, ( )) = - (mod ) +
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) node speed and trajectory, and direction of movement of a given node. If nodes do not move along straight lines and their direction is randomized, or, if a group of nodes move closely together or intersect paths, such attacks fail or degenerate to k-anonymity.
= and finally checks if
(mod (mod
) ) equals
Signature verification fails if a LAM does not travel the same route as it should Security Analysis Outsider Attacks A passive outsider eavesdropping on all LAMs can, at most, obtain exactly the same information available to any legitimate MANET node. This would only happen if keys used to encrypt all communication in the MANET are leaked. Thus, a passive outsider is at most as powerful as a passive insider and, thus, protection against it is guaranteed as a side effect of thwarting passive insider attacks. Since group signatures attached to each LAM are untraceable and unlinkable, the only way to track nodes is by guessing possible trajectories. Passive Insider Attacks A passive insider (legitimate MANET node) can, by design, obtain all LAM sand determines their authenticity by verifying corresponding group signatures. But, also by design, it can neither identify nor link nodes that generated these LAMs, since group signatures are untraceable. A passive insider with other means of collecting mobility information, e.g., by visual monitoring, can determine that a certain node remains stationary. This might happen if, in two consecutive time-slots, the insider physically (i.e, visually) observes lack of mobility and also receives two LAMs referring to the same location. Clearly, there is no protection against such attacks, since they involve adversarys physical presence. A passive insider can attempt to track a nodes movements by using viable trajectory information [8]. This attack is possible if the adversary knows the MANET topology, as well as approximate
V RESULTS AND DISCUSSION

Privacy metric LAR provides node privacy by preventing tracking by both insider and outsider adversaries. To illustrate its effectiveness, we define a new privacy metric called Average Node Privacy (ANP). Basically, ANP is a cumulative version of k-anonymity [10] over time and averaged over the entire network. Given the successive topology snapshots during the operation of the network (T snapshots), ANP represents the average fraction of nodes that a given node can be equally likely mapped to. This is similar to the kanonymity concept where a nodes privacy is preserved by making it indistinguishable from a set of k other nodes. ANP is computed as follows:
Where K is the total number of nodes in the MANET. T is The number of snapshots of the network over time. Kt i is the number of nodes from snapshot t to which node i cannot be mapped to, assuming that the adversary knows where i was at snapshot t-1. The T.K term in the denominator normalizes the metric so that it has a maximum value of 1.Ki depends on the underlying mobility pattern (i.e.,
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) direction and speed of movement), time between Parameter Simulation Area Simulation Time Simulation Inter-LAM interval Node Speed Number of nodes Mobility Models Value 1000m 1000m 100000 sec 1000 runs Varied from 5sec to 30sec Varied from 5m/sec to100m/sec 100 -Random Walk and Random Waypoint Mobility -Reference Point Group Mobility(RPGM) -Time Variant User Mobility(TVUM) successive topology snapshots (i.e., time between two LAMs) and size of the area within which the nodes move. Between two successive snapshots of the topology, K will include nodes outside a circle defined by r (r =node speed. LAM period) as its radius and the location of node i in the first snapshot as the centre. Random Walk Mobility (RWM): In this model, a node chooses a random destination within the area and moves towards it. Once a node reaches its destination, it randomly chooses a new one and starts moving toward it. Random waypoint and RWM have been criticized to be unrealistic [9], however, we use RWM as a base-case to show that completely random movements might not yield the highest level of privacy. Each group has a logical centre which defines movement patterns for the entire group, i.e., speed, acceleration and direction. Each group member is placed randomly in the vicinity of its reference point, relative to the group centre. This ensures that relative positions of nodes inside the group change over time. When nodes move according to the RPGM model with low speeds and with small inter-LAM intervals, ANP is higher than when all nodes move independently. TVUM: This model was motivated by two observations typical in traces of mobile wireless networks: skewed location visiting preference and periodic reappearance. The distinctive feature of TVUM is in defining often-visited communities (areas) so as to capture skewed location visiting preferences and the use of time periods with different mobility parameters to create periodic reappearance. Each node is randomly assigned to a community. TVUM defines two time periods: normal movement period (NMP) and concentration movement period (CMP). Within a CMP, a node visits its community with high probability. A node has two different modes of movement: local epoch only be mapped to one other node, then nodes become completely traceable and node privacy is violated.
Fig 5 Effects of number of groups on ANP (RPGM) ANP is highest when the best mapping an adversary can construct is one where a node from snapshot-1 is equally likely to be mapped to any of the K nodes in snapshot t. In this case, r is the longest possible travelling distance in the area of movement (e.g., the diagonal in the case of a square) and ANP will be 1. When each node can
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) and roaming epoch. In a local epoch, nodes mobility is confined within its community. In a roaming epoch, a node is free to move within the whole simulation area. A node switches between epochs based on a two-state Markov chain model. 5. G. Calandriello, P. Papadimitratos, J.-P. Hubaux, and A. Lioy, Efficient and Robust Pseudonymous Authentication in VANET, Proc. ACM Intl Workshop Vehicular Ad Hoc Networks (VANET 07), pp. 19-28, Sept. 2007. 6. J. Kim and G. Tsudik, SRDP: Securing Route Discovery in DSR, Proc. Mobiquitous, 2005. 7. A. Lysyanskaya, S. Micali, L. Reyzin, and H. Shacham, Sequential Aggregate Signatures from Trapdoor Permutations, Proc. Advances in Cryptology (EUROCRYPT 04), pp. 74-90, 2004. 8. L. Huang, K. Matsuura, H. Yamane, and K. Sezaki, Enhancing Wireless Location Privacy Using Silent Period, Proc. IEEE Wireless Comm. and Networking Conf., vol. 2, pp. 1187-1192, 2005. 9. N. Sadagopan Fan Bai and A. Helmy, IMPORTANT: A Framework to Systematically Analyze the Impact of Mobility on Performance of Routing Protocols for Adhoc Networks, Proc. IEEE INFOCOM, vol. 2, pp. 825-835, 2003 10. L. Sweeney, k-Anonymity: A Model for Protecting Privacy, Intl J. Uncertainty, Fuzziness and Knowledge-Based Systems, vol. 10, no. 5, pp. 557-570, Oct. 2002. 11. H. Li, J. Ma, X. Li, and W. Zhang, An Efficient Anonymous Routing Protocol for Mobile Ad Hoc Networks, Proc. Information Assurance and Security Conf. (IAS 09), pp. 287-290, 2009. 12. D. Sy, R. Chen, and L. Bao, ODAR: OnDemand Anonymous Routing in Ad Hoc Networks, Proc. IEEE Intl Conf. Mobile Ad Hoc and Sensor Systems (MASS 06), pp. 267-276, Oct. 2006. 13. G. Tsudik and S. Xu, A Flexible Framework for Secret Handshakes, Proc. Privacy-Enhancing Technologies ( ETs 06), 2006. 14. EU Cooperative Vehicle-Infrastructure System Project, http:// www.cvisproject.org, 2011. 15. Y. Zhang, W. Liu, W. Lou, and Y. Fang, MASK: Anonymous On- Demand Routing in Mobile Ad Hoc Networks, IEEE Trans. Wireless Comm., vol. 5, no. 9, pp. 2376-2385, Sept. 2006.
VI CONCLUSION
In most common mobile ad hoc networking (MANET) scenarios, nodes establish communication based on long-lasting public identities. However, in some hostile and suspicious settings, node identities must not be exposed and node movements should be untraceable. Instead, nodes need to communicate on the basis of their current locations. In this paper presented ALARM technique to find suspicious MANETs, which support location based routing in suspicious MANETs. ALARM relies on group signatures to construct one-time pseudonyms used to identify nodes at their present locations. The protocol works with any group signature scheme and any location-based forwarding mechanism. We evaluated the overhead and scalability of ALARM and showed that it performs close to other protocols (e.g., OLSR) optimized to reduce control traffic. We also evaluated ALARMs trackingresistance with different mobility model. 1. R. Perlman, Network Layer Protocols with Byzantine Robustness,PhD dissertation, Massachusetts Inst. of Technology, http://www.vendian.org/mncharity/dir3/perlman_th esis, 1988. 2. OSPF with Digital Signatures, IETF RFC 2154, http://www.ietf.org/rfc2154.txt,1997. 3. S.L. Murphy and M.R. Badger, Digital Signature Protection of the ospf Routing Protocol, Proc. IEEE Symp. Network and Distributed System Security (SNDSS 96), p. 93, 1996. 4. Open SSL: The Open Source RTool kit for SSL/TLS, http://www. openssl.org, 2011.
REFERENCES
AUTOMATIC SEGMENTATION AND CLASSIFICATION OF LUNG CT IMAGES

K.Veera kumar 1, Dr.C G.Ravichandran2, Research Scholar, Anna University of Technology, Madurai. 2 Professor and Principal, RVS College of Engg and Technology Dindigul.
1
Abstract:
Lung cancer is the most important causes of cancer death for both men and woman. Lung cancer diagnosis using pattern classification is the active research topic in medical image processing. Segmentation is considered as an essential step in medical image analysis and classification. Computeraided segmentation for Computed topography (CT) and Magnetic resonance (MR) imaging are finding the application in computer aided diagnosis, clinical studies, and treatment planning. In Medical images typically suffer from one or more imperfections such as; low resolution (in the spatial and spectral domains), high level of noise, low contrast, geometric deformations, presence of imaging artifacts. This discussion involves segmentation and classification methods to improve the accuracy of classification (occurrence and non-occurrence of cancer in the lung).
INTRODUCTION
This paper describes a fully automated segmentation and recognition scheme, which is designed to recognize lung anatomical structures in the human chest by segmenting the different chest internal organ and tissue regions sequentially from high-resolution chest images. A new fully automatic method has been proposed based upon genetic algorithms and morphology based image processing techniques to segment that part. Image Segmentation can also be viewed as a problem of clustering, which is aimed at the partitioning of given set of pixels into the number of segments. The closeness of the gray level between lung tissues and chest tissues makes lung segmentation based only on the image signals that are difficult. The early detection of cancer can be helpful in curing the disease completely. So the requirement of techniques is to detect the occurrence of cancer nodule in early stage is increasing. There are different technique exists but none of those provide better accuracy for detection. They are the extraction of lung region from chest computer tomography images; segmentation of
lung region feature extraction from the segmented region, diagnosis rules forms the extracted features and classification of occurrence and non-occurrence of cancer in the lung. Image segmentation is an important step in many computer vision algorithms. The separately obtained area is then analyzed for detection of nodules to diagnose the disease. Computed tomography (CT), Magnetic resonance imaging (MRI), digital mammography, and other imaging modalities provide an effective means for non invasively mapping the anatomy of a subject. These technologies have greatly increased knowledge of normal and diseased anatomy for medical research and are a critical component in diagnosis and treatment planning. In medical imaging, segmentation is important for feature extraction, image measurements, and image display. In some applications it may be useful to classify image pixels into anatomical regions, such as bones, muscles, and blood vessels, while in others into pathological regions, such as cancer, tissue deformities, and multiple sclerosis lesions.
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) Kanazawa et al. (1996) described Computer aided diagnosis system for lung cancer based on helical CT images. In this study, the author describes a computer assisted automatic diagnosis system (Hara et al., 1999) for lung cancer that detects tumor candidates at an early stage from helical Computerized Tomographic (CT) images. This mechanization of the process decreases the time complexity and increases the diagnosis confidence. The proposed algorithm consists of an analysis part and a diagnosis part. In the analysis part, this study extracts the lung and pulmonary blood vessel regions and analyzes the features of these regions using image processing techniques. In the diagnosis part, this study defines diagnosis rules based on these features and detect tumor candidates using these rules. The author has applied the proposed algorithm to 450 patient's data for mass screening. The experimental results indicate that the proposed algorithm detected lung cancer candidates successfully. Penedoet al. (1998) put fourth Computer-aided diagnosis: a neural-networkbased approach to lung nodule detection. In this study, the authors have provided a computeraided diagnosis system based on two-level Artificial Neural Network (ANN) architecture. This technique was trained, tested an devaluated in particular on the problem of detecting lung cancer nodules found on digitized chest radiographs. The initial ANN carries out the detection of suspicious regions in a lowresolution image. The input supplied to the second ANN is the curvature peaks computed for all pixels in every suspicious region. This is determined from the fact that small tumors possess an identifiable signature in curvaturepeak feature space, where curvature is the local curvature of the image data when viewed as a relief map. The result of this network is threshold at a selected level of significance to give a positive detection. Tests are carried out using 60radiographs taken from a routine clinic with 90 real nodules and 288simulated nodules. This study employed free-response receiver operating characteristics method with the mean number of False Positives (FP's) and the sensitivity as performance indexes to evaluate all the simulation results. The grouping of the two networks provide results of 89%-96% sensitivity and 5-7 FP's/image, depending on the size of the nodules (Gurcanet al., 2002). Yamamoto et al. (2000) explained Computer aided diagnosis system with functions to assist comparative reading for lung cancer based on helical CT image. The author have reported that a prototype Computer-Aided Diagnosis(CAD) system (Kanazawa et al., 1998) to automatically detect suspicious regions from chest CT images had been presented and the CT screening system used was a TCT-900 super helix of the Toshiba Corporation. In this study, the author proposes a new and automatic technique for an early diagnosis of lung cancer based on a CAD system in which all the CT images are read. In addition, the CAD system is equipped with functions to automatically detect suspicious regions from chest CT images and to as is the comparative reading in retrospect. The main purpose of the CAD system is that it uses a slice matching algorithm for comparison of each slice image of the present and past CT scans and an interface to display some features of the suspicious regions. The experimental results show that this CAD system can work effectively. Yimet al. (2005) stated about Hybrid lung segmentation in chest CT images for computer-aided diagnosis. The author proposes an automatic segmentation technique for accurately identifying lung surfaces in chest CT images. The proposed technique consists of three steps. Initially, lungs and airways are extracted by an inverse seeded region growing and connected component labeling. Next, trachea and large airways are delineated from the lungs by three-dimensional region growing. Then, accurate lung region borders are acquired by subtracting the result of the second step from that of the first step. The proposed technique has been applied to 10 patient datasets with lung cancer or pulmonary embolism. Experimental results indicate that the segmentation method
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) extracts lung accurately. surfaces automatically and Automatic segmentation of medical images is a difficult task as medical images are complex in nature and rarely have any simple linear feature. Further, the output of segmentation algorithm is affected due to

Jose et al. presents a Genetic Embedded Approach for Gene Selection and Classification of Micro array Data Classification of micro array data requires the selection of subsets of relevant genes in order to achieve good classification performance. This article presents a genetic embedded approach that performs the selection task for a SVM classifier. The main feature of the proposed approach concerns the highly specialized crossover and mutation operators that take into account gene ranking information provided by the SVM classifier. The effectiveness of this approach is assessed using three well-known benchmark data sets from the literature, showing highly competitive results. M. Gomati et al. (2010) the lung region is segmented using Fuzzy Possibilistic C Mean (FPCM) clustering algorithm after that the features are extracted and the diagnosis rules are generated. These rules are then used for learning with the help of Support Vector Machine (SVM)
Partial volume effect. intensity inhomogeneity presence of artifacts closeness in gray level of different soft tissue
(i).Thresholding Thresholding approaches segment scalar images by creating a binary partitioning of the image intensities. A thresholding procedure attempts to determine an intensity value, called the threshold, which separates the desired classes. The segmentation is then achieved by grouping all pixels with intensity greater than the threshold into one class, and all other pixels into another class. Determination of more than one threshold value is a process called multithresholding (ii).Edge based segmentation Edge based segmentation is the most common method based on detection of edges i.e. boundaries which separate distinct regions. Edge detection method is based on marking of discontinuities in gray level, color etc., and often these edges represent boundaries between objects. The different edge based segmentation algorithm such as Edge relaxation, Border detection method, Hough transform based. Limitations of edge based method (i). Performance is affected by the presence of noise (ii).Fake edges and weak edges may be present in the detected edge image which may have a negative influence on segmentation results. (iii).Region based segmentation Region based methods are based on the principle of homogeneity - pixels with similar properties are clustered together to form a homogenous region. Region based segmentation is divided into three types based on the principle
COMPARATIVE ANALYSIS
In this section we brief the review with some of the existing Segmentation and classification approaches. Here, we first discuss about existing Segmentation approaches and Second classification approaches. (A).SEGMENTATION: The role of segmentation is to subdivide the objects in an image; in case of medical image segmentation the aim is to:

Study anatomical structure Identify Region of Interest i.e. locate tumor, lesion and other abnormalities Measure tissue volume to measure growth of tumor (also decrease in size of tumor with treatment) Help in treatment planning prior to radiation therapy; in radiation dose calculation
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) of region growing: Region merging, Region splitting, Split and merge.limitation of region based segmentation is that there are chances of under segmentation and over segmentation of regions in the image. (iv).Model based segmentation The basic approach is that the structure of organs has a repetitive form of geometry and can be modeled probabilistically for variation of shape and geometry. Model based methods of segmentation involve active shape and appearance model, deformable models and level-set based models. Limitation of Model based segmentation as follows require manual interaction to place an initial model and choose appropriate parameters. Standard deformable models can also exhibit poor convergence to concave boundaries. (v).Atlas based segmentation The most frequently used and powerful approaches in the field of medical image segmentation. In this, information on anatomy, shape, size, and features of different, organs, soft tissues is compiled in the form of atlas or look up table (LUT). Atlas guided approaches are similar to co-relation approaches and the plus point of atlas based approaches is - it performs segmentation and classification in one go. Limitations in segmenting complex structure with variable shape, size. (B).CLASSIFICATION: Classifier methods are pattern recognition techniques that seek to partition a feature space derived from the image using data with known labels. Classifiers are known as supervised methods since they require training data that are manually segmented and then used as references for automatically segmenting new data. There are a number of ways in which training data can be applied in classifier methods.
CONCLUSION
The existing computer aided diagnosis (CAD) method of segmentation and recognition techniques are having many disadvantages. In my proposed method using a Novel segmentation and recognition techniques for improve the accuracy of cancer affected lung CT image.
REFERENCE
1. International Journal of Computer Science and Information Security, Vol. 7, No. 3, March 2010 A New Approach to Lung Image Segmentation using Fuzzy Possibilistic C-Means Algorithm, M.Gomathi Dr. P.Thangaraj 2. IEEE transactions on pattern analysis and machine intelligence, VOL. 25, NO. 11, NOVEMBER 2003 Hidden Markov Measure Field Models for Image Segmentation, Jose L. Marroquin, Edgar Arce Santana, and Salvador Botello. 3. IEEE geoscience and remote sensing letters, vol. 8, no. 3, may 2011, Hybrid Bayesian Classifier for Improved Classification Accuracy, Uttam Kumar, Student Member, IEEE, S. Kumar Raja, ChiranjitMukhopadhyay, and T. V. Ramachandra. 4. IEEE transactions on geoscience and remote sensing, vol. 49, no. 10, october 2011, Image Segmentation Using a New Bayesian Approach With Active Learning, Jun Li, Jos M. BioucasDias, Member, IEEE, and Antonio Plaza, Senior Member, IEEE 5. IEEE transactions on medical imaging, vol. 31, no. 2, february 2012 449, Automated 3-D Segmentation of Lungs With Lung Cancer in CT Data Using a Novel Robust Active Shape Model Approach, Shanhui Sun, Christian Bauer, and ReinhardBeichel. 6. IEEE transactions on information technology in biomedicine, vol. 15, no. 2, march 2011, Vessel Tree Segmentation in Presence of
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) Interstitial Lung Disease in MDCT Panayiotis D. Korfiatis, Cristina Kalogeropoulou, Anna N. Karahaliou, Alexandra D. Kazantzi, and Lena I. Costaridou 7. IEEE transactions on information technology in biomedicine, vol. 15, no. 5, september 2011, Automated Delineation of Lung Tumors in PET Images Based on Monotonicity and a TumorCustomized Criterion, Cherry Ballangan, Xiuying Wang, Michael Fulham, Stefan Eberl, Yong Yin, and Dagan Feng. 8. IEEE transactions on biomedical engineering, vol. 58, no. 1, january 2011, Estimation of Lungs Air Volume and Its Variations Throughout Respiratory CT Image Sequences, Ali SadeghiNaini, Ting-Yim Lee, Rajni V. Patel, and Abbas Samani. 9. IEEE transactions on medical imaging, vol. 29, no. 2, february 2010, The LOG Characteristic Scale: A Consistent Measurement of Lung Nodule Size in CT Imaging, Stefano Diciotti, Simone Lombardo, Giuseppe Coppini, Luca Grassi, Massimo Falchini, and Mario Mascalchi 10. European Journal of Scientific Research ISSN 1450-216X Vol.51 No.2 (2011), pp.260275, A Computer Aided Diagnosis System for Lung Cancer Detection using Machine Learning Technique, M. Gomathi, P. Thangaraj. 11. European Journal of Scientific Research ISSN 1450-216X Vol.58 No.2 (2011), pp.156165, Cancer Classification using ModifiedExtreme Learning Machine based on ANOVA Features, A.Bharathi, A.M.Natarajan. 12. International Journal of BioelectromagnetismVol. 9 No. 2 2007, Three Generations of Medical Image Segmentation: Methods and Available Software, D.J. Withey and Z.J. Koles. 13. IMECS 2011, Application of AI Techniques in Medical Image Segmentation and Novel Categorization of Available Methods and Tools, M. Rastgarpour, J. Shanbehzadeh, Iaeng. 14. Proceedings ofthe 2009 IEEE International Conference on Mechatronics and Automation August 9 - 12, Changchun, China Research of Automatic Medical Image Segmentation Algorithm Based on Tsallis Entropy and Improved PENN Shi WeiIi, Miao Yu, Chen Zhanfang Zhang Hongbiao. 15. Segmentation of medical images using adaptive region growing, Regina Pohle, Klaus D. Toennies Otto-von-Guericke University Magdeburg, Department of Simulation and Graphics. 16. International Journal OfAcademic Research Vol. 3. No. 5. September, 2011, I Part-Lungs Segmentation For Computer Aided Diagnosis, SaleemIqbal, Khalid Iqbal. 17. IJCST Vol. 2, Issue 4, Oct . - Dec. 2011, Medical Image Segmentation using Marker Controlled Watershed Transformation, MandeepKaur, Gagandeep Jindal. 18. 8th. World Congress on Computational Mechanics (WCCM8), 5th. European Congress on Computational Methods in Applied Sciences and Engineering (ECCOMAS 2008) June 30 July 5, 2008 Venice, Italy- Segmentation Of Structures In 2d Medical Images, Zhen Ma, Joo Manuel R. S. Tavares and Renato Natal Jorge. 19. IEEE transactions on medical imaging, vol. 23, no. 4, april 2004, Improved Watershed Transform for Medical Image Segmentation Using Prior Information, V. Grau, A. U. J. Mewes, M. Alcaiz, R. Kikinis, and S. K. Warfield. 20. International Journal of Innovative Computing, Information and Control ICIC International c 2009- Fuzzy Entropy And Morphology Based Fully Automated Segmentation Of Lungs From Ct Scan Images, M. ArfanJaffar, AyyazHussain, Anwar M. Mirza. 21. World Academy of Science, Engineering and Technology 20 2006- Segmentation of
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) Lungs from CT Scan Images for Early Diagnosis of Lung Cancer, Nisar Ahmed Memon, Anwar MajidMirza, and S.A.M. Gilani 22. Automated Lung Nodule Segmentation Using Dynamic Programming and EM Based Classification, NingXua, NarendraAhujaaand Ravi BansalbUniversity of Illinois at UrbanaChampaign, IL 61801 bSiemens Corporate Research, Inc. Princeton, NJ 08540 Heuberger, Antoine Geissbuhler, Henning Muller 25. ELSEVIER - Computerized Medical Imaging and Graphics 30 (2006) 299313Automatic segmentation and recognition of anatomical lung structures from high-resolution chest CT image. 26. Function Optimization and Clustering using Computational Intelligence Techniques-Chapter
23. Turk J ElecEng& Comp Sci, Vol.18, No.4, 2010, A novel method for lung segmentation on chest CT images: complex-valued artificial neural network with complex wavelet transform, Murat Ceylan, YUksel Ozbay, o. Nuriucan, Erkanyildirim 24. Lung CT segmentation for image retrieval using the Insight Toolkit (ITK), Joris 5: Genetic Based Optimal Threshold for Medical Image Segmentation. 27. ELSEVIER, ScienceDirectComputerized Medical Imaging and Graphics journal homepage:www.elsevier.com/locate/compmedi mag. Joint registration and segmentation of serial lung CT images for image-guided lung cancer diagnosis and therapy, ZhongXue, KelvinWong, Stephen T.C.Wong
SVM Based Intelligent Prediction with Heart Disease Datasets

1
Teena Anicattu Mathew1, Dr. T. Ravi 2 P. G. Student, M. E. Computer Science and Engineering, KCG College of Technology, Chennai. 2 Professor, Computer Science and Engineering, KCG College of Technology, Chennai.
Abstract
Machine Learning (ML) had found its place in research domain and is becoming a reliable tool in medical domain. Automatic Learning process in ML is widely used in medical decision support, image processing, extraction of knowledge etc. Electronic Health Record (EHR) is the standard in the health care domain which is having many potential benefits such as health information recording and clinical data repositories, medication management, decision support and to obtain treatment for specific health needs for patients. The datasets required for developing an expert system is obtained from EHR. An intelligent diagnosis of heart disease that comprises of data processing can be done using the most efficient classifier, Support vector Machine (SVM). The Radial Basis Kernel Function (RBF) is used to improve the accuracy performance of SVM classifier. The Heart Disease Datasets are used as a library which contains the most important attributes such as age, sex, chest pain type, resting blood pressure, serum cholesterol in mg/dl, fasting blood sugar, ecg results etc, for the SVM. The SVM along with the RBF kernel, improves the accuracy in the diagnosis of the heart disease of the patient. Keywords SVM, SME, Radial Basis Kernel Function, hyperplane, Expert Systems, Confusion Matrix.
I.
INTRODUCTION
The current world is becoming more worried and is trying to be cautious about their health and healthcare. Electronic Health Records (EHR) is the standard healthcare domain [1]. It is having many potential benefits such as Health information recording and clinical data repositories, medication management, decision support and to obtain treatment for specific health needs. EHR system provides better, faster and more reliable access to information. Machine Learning (ML) technique is the main objective of this work. They are suitable for training and testing the dataset. The medical domain provides dataset for the Heart Disease information with latest discoveries for the purpose of data mining. The application of data
mining in medicine has proved successful in areas of diagnosis, prognosis and treatment [4]. Previous works and studies shows that improved medical diagnosis and prognosis may be achieved through automatic analysis of patient data stored in medical domains and repositories i.e. by learning from past experiences [5]. Machine learning is a branch of artificial intelligence which is concerned with the design and development of algorithms that will help the computers to evolve behaviours based on empirical data, such as from sensor data or databases. Machine learning in research field is to automatically learn to recognize complex patterns and make intelligent decisions based on data. The difficulty in this lies in the fact that the set of all possible behaviours given all
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) possible inputs is too large to be covered by the set of observed examples (training data). Hence the learner must generalize from the given examples, so as to be able to produce a useful output in new cases. ML focuses on the prediction, from the known properties learned from the training data. ML technique is used to build an expert system. An Expert system is a computer system that equalises the decision making ability of a human expert. The expert system through an interface engine is connected to the knowledgebase and is used to improve the performance of the expert system. The creation of the knowledgebase is to capture the Subject Matter Experts (SME) knowledge [2]. An SME is a person who is an expert in a particular topic. The gathered knowledge is coded accordingly called knowledge engineering (KE). KE is an engineering discipline that involves integrating knowledge into computer system inorder to solve complex problems normally requiring high level of human expertise. Once the expert system is being developed, it can be used in the real world problem solving situation to aid human workers. Typical examples of expert system are: MYCIN (to identify bacteria causing severe infections), DENDRAL (to identify the structure of organic molecules), SAINT (to generate simpler expressions for complex functions), DESIGN ADVISER (to process chip design to give advice to the designer about component placement etc.), PUFF (for the diagnosis of respiratory condition), PROSPECTOR (to identify sites for drilling or mining, used by geologists), LITHIAN (to advice archaeologists to examine stone tools). The medical data consist of hundreds of independent features in the multidimensional databases and need to be analysed for precise decision making. For this purpose, the dataset should be processed first. The performance of the SVM classifier is evaluated on the reduced dataset. The dataset used for this work is Heart Disease Dataset [3] In this paper, Section II explains about the related work, Section III explains about Dataset Preprocessing, Section IV about Expert Systems, Section V about Support Vector Machine, Section VI about Radial Basis Function, Section VII about the Methodology used, Section VIII about the Analysis and Result and Section IX about the graph analysis and Section X about Conclusion And Future Work.
II.
RELATED WORKS
Various classification algorithms have been performed on Turkoglus valvular heart disease data set [6-11]. This work uses backpropagation artificial neural networks (BPANN) classifier and the performance evaluation of the system was about 94-95.9%. Later on he suggested an intelligent system for detection of heart valve disease based on wavelet packet neural networks (WPNN) [7] and the accuracy of the classification rate was found to be about 94%. Comak et al then investigated the use of least-square support vector machines (LS-SVM) classifier for improving the performance of the Turkoglus proposal [8]. Uguz et al. performed a biomedical system based on Hidden Markov Model using continuous HMM (CHMM) for clinical diagnosis and recognition of heart valve disorders [9] and the experimental results were found to be 94%. Sengur et al. investigated the use of Linear Discriminant Analysis (LDA) and Adaptive neuro-fuzzy inference system (ANFIS) for clinical diagnosis and recognition of heart valve disorders [11] and the validation for this method was measured by sensitivity and specificity parameters i.e., 95.9% and 94% respectively.
III. DATASET PREPROCESSING
The first step after data collection, before using the data is data refinement. Medical databases contain data with incompleteness (missing values), incorrectness (noise in data), sparseness (few and/or non-representable patient records) and inexactness (inappropriate selection of parameters). These data will cause the predictive performance of the data mining
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) algorithm to decline. Therefore preprocessing is required to prepare the data for data mining and machine learning to increase the predictive accuracy. Feature selection is done on the dataset to identify the dataset containing the smallest number of non-redundant features for producing the best results [3]. Data refinement is done by deleting the data/chromosomes having null values. By deleting such bad descriptors, the chance of redundancy and overlapping of the descriptors will be eliminated. The types of descriptors that should be eliminated are: Those with too many zeros. Those with very small standard deviation. Those that are highly correlated with others. An Expert System is a computer system that emulates the decision-making ability of a human expert. Expert systems are designed to solve complex problems like a human expert by reasoning about knowledge and not by following the procedure of a developer. An expert system has a unique structure which is different from the traditional programs. It is divided into three parts: the inference engine (fixed and independent of the expert system), the knowledge base (variable) and dialog interface. A third part, the dialog interface is used to communicate with users. The inference engine is a computer program designed to produce reasoning on rules and reasons about the knowledge base like a human. In order to produce reasoning, it is based on logic. There are several kinds of logic: propositional logic, predicates of order 1 or more, epistemic logic, modal logic, temporal logic, fuzzy logic, etc. Expert Systems have certain advantages and disadvantages. It offers many advantages to the user which performs as a human brain. They are listed below: Conversational Quick availability and opportunity to program itself Ability to exploit a considerable amount of knowledge Reliability Scalability Evolving an expert system is to add, modify or delete rules. Pedagogy The engines that are run by a true logic are able to explain to the user in plain language why they ask a question and how they arrived at each deduction. The user has information about their problem even before the final answer of the expert system. Preservation and improvement of knowledge Knowledge can disappear with the death, resignation or retirement of an expert. By recording this knowledge in an expert system, it becomes eternal.
Another important step that should be followed for data preprocessing is the data scaling. Data scaling is required as the values of different descriptors are having different numerical ranges, and these descriptors have to be scaled to the same range, i.e., to the range (1, +1). This is done using the formulae [15]: = ----- (1)
V is the original value and Vscaled is the scaled values. Min and max values are the minimum and maximum values of the feature respectively.
IV. INTRODUCTION TO EXPERT SYSTEMS
In this work, the dataset is mainly dealing with Heart Diseases. The traditional way for analyzing the heart failure, the popular methods used are MRI scan, Doppler Effect. MRI scan provides a three dimensional image of the heart on the monitor through which the problem can be identified. Doppler Effect can be used to identify the problem by measuring the velocity of the blood passing through the veins and arteries.
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) The Expert Systems are also facing some disadvantages which explain its low success even though it existed too many years. They as mentioned below: Knowledge collection and interpretation into rules, i.e., the knowledge engineering. Manual work is needed which increases large number of errors. Consistency of rules control is required. Most of the logics (Many expert systems are also penalized by the logic used) operate on "variables" facts, i.e. whose value changes several times during one reasoning.
V.
Radial Basis Function is compared and evaluated. SVMs build a hyperplane which divides examples into two classes, such that one class comes on side of the hyperplane, and the other class are all on the other side (Figure 1). The objects in the first class (positive) are each assigned a value of Yi= +1, those in the second class are Yi=-1. In linearly separable cases the objects can be correctly classified by: w.xi + b +1 for ---- (2) ---- (3) w.xi + b -1 for yi = +1 (class 1) yi = -1 (class 2)
SUPPORT VECTOR MACHINE
SVM is a learning technique or a state of art pattern recognition. It trades off accuracy for generalization error. SVM can handle regression estimation and density estimation. An SVM is a general algorithm i.e., the so called structural risk minimization principle. It is a learning machine that provides new guidelines and deep insight into the general characteristics and nature of the model building/learning /fitting process [12]. Support vector machine (SVM) is a novel learning machine introduced first by Vapnik [13]. It is based on the Structural Risk Minimization principle from computational learning theory. Hearst et al. [14] marked the SVM algorithm at the intersection of learning theory: it contains a large class of neural nets, radial basis function (RBF) nets, and polynomial classifiers as special cases. SVM has yielded excellent generalization performance on a wide range of problems including bioinformatics, text categorization, image detection, etc. The SVM approach has been applied in several financial applications recently, mainly in the area of time series prediction and classification [16, 17]. The performance of the SVM approach in the domain of the Heart Disease in comparison
Where w is a vector normal to the hyper plane and b is a scalar quantity. The SVM attempts to find an optimal separating hyper plane with maximum margin solving the following optimization problem: subject to yi (w.xi+b)-1 0 ---(5) The above concepts can also be extended to linearly nonseparable cases, in which no hyperplane can be used to perfectly separate sets of points. In this case, we can introduce nonnegative slack variables i=1, 2, 3.m. such that --- (6) --- (7) w.xi + b +1-i for w.x1 + b -1+1 for yi = +1 y1 = -1 -
The purpose here is to find a hyper plane that provides the minimum number of training errors i.e. to minimize the constraint violation. The equation to be solved becomes: subject to yi (w.xi+b)1+ i 0

A. Heart Disease Dataset Collection and
---- (8) Where c is a user predetermined penalty parameter. The parameter c has an important impact on the accuracy of the SVM classifier, thus should be chosen carefully [15].
Dataset Preprocessing
Figure 1: Structure of a simple SVM. In SVM, the idea is to find the hyperplane that maximizes the minimum distance from any vector points. The samples could be linearly separable. When the samples are not linearly separable, a kernel function is used. This kernel function will transform the data to a higher dimensional area/space and where it can be linearly separable. This can be done by projecting the input variables into the new dimensional feature using the kernel function K(Xi, Xj)The two kernel functions available in SVM are: Polynomial Kernel Function, Sigmoid Kernel Function and Radial Basis Kernel Function (RBF). Among this, RBF is the most widely used. The concept of this SVM was originally proposed for binary classification of data. The equation for representing the RBF is as given below: ---- (9) is the width specified by the user [17]. VII. METHODOLOGY
INPUT TRAIN DATA PREPR OCESS TEST DATA TEST OUTPUT
VI.
RADIAL BASIS FUNCTION
The dataset used for the classification is the Heart disease dataset. The aim of using this dataset is to identify or classify the presence or absence of heart disease in the patient based on various test results. The Heart Disease dataset consist of 76 different attributes. Out these 76, 14 attributes are selected. The features selected are: age, sex, chest pain type, resting blood pressure, serum cholesterol in mg/dl, resting electrocardiograph results, maximum heart rate achieved, exercise induced angina, old peak = ST depression induced by exercise relative to rest, the slope of the peak exercise, number of major vessels coloured by fluoroscopy, thal: 3 = normal; 6 = fixed defect; 7 = reverse defect. Dataset prreprocessing should be done in prior to increase the accuracy of the performance. The preprocessed data is recorded in local SVM memory. After preprocessing the following three steps are followed, such as: selection of informative features, extraction of pattern using clustering and classification using SVM. TABLE I DATASET FOR HEART DISEASE
LIBSVM KERNEL SELECTIO MODEL N TRAININ CROSSVA G LIDATION
Figure 2: System architecture Organized by: Department of Computer Science and Engineering , Anand Institute of Higher Technology, Chennai. www.iconic12.in E-mail: iconic.aiht@gmail.com Page 45

B. Experimental Analysis
After the classification of the data in the dataset is completed, the classified data is normalized and tabulated. Normalization is done inorder to eliminate the incomplete (missing values), incorrect (noise in data), sparse (few and/or non-representable patient records) and inexact (inappropriate selection of parameters) values, which will cause the degradation of the prediction performance. These normalized data are then tabulated accordingly in columns. For performing further training of the classified results, the information is coded in binary form, i.e, as 0s and 1s. If the symptom or the disease is present in the patient, the particular point is defined as ONE (1). Else if the symptom or the disease is not present in the patient, the particular point is defined as ZERO (0). This process of converting into the binary form is performed all the 14 attributes of the dataset. This work may not produce the appropriate results while using this sequence of the binary format. Therefore the order of the fields can be changed. Thus the best result should be found out through each move of the field. The sequence of field which produces the best result should be chosen. After the completion of the training phase, the final result is compared with the manual result for accuracy verification. The kernel function, which is a special case of the SVM, can be used as the kernel for this work. By using this function, the predictive accuracy is more increased.
C. Experimental Setup
b) The Kernel Function- the Polynomial Kernel Function and Radial Basis Kernel Functions are worked out with different parameter values. The work performed in the experimental setup, i.e., the SVM and the kernel functions perform the work as given in section V and VI.
D. Methods and Modules Used
Figure 2 shows the system architecture of the SVM machine. The main two phases of machine learning are Training phase and testing phase. The modules used are Sparse Matrix rule, svmtrain and svmpredict. The input datasets are fed to the SVM machine one by one and it is pre-processed by the SVM in order to reduce the feature selection. After the pre-processing is done, the samples are taken for training and testing the machine. They are grouped as train data and test data. The data for training the machine is passed to LIBSVM. LIBSVM is a library for SVM. The LIBSVM consists of kernel selection, model training and cross validation. The kernel selection includes the selection different kernels. -t is the sub type for the kernel. E.g., -t 2, which selects the radial basis function kernel. -t kernel_type : set type of kernel function 0 -- linear: u'*v\n" 1 -- polynomial: (gamma*u'*v + coef0) ^degree\n" 2 -- radial basis function: exp(-gamma*|uv|^2)\n" 3 -- sigmoid: tanh(gamma*u'*v + coef0)\n" 4 -- precomputed kernel Model selection includes the selection of SVM model. s is the sub type for SVM. E.g., -s 0, which selects C-SVC SVM type. -s svm_type : set type of SVM (default 0) 0 -- C-SVC 1 -- nu-SVC
The classification task is performed by the SVM implemented using the MatLab software kit. The software will tune the performance of the SVM. To perform the fine tuning of the performance of SVM experiments are conducted, like: a) Multiclass SVM- by using either One against One or the One against All model.
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) 2 -- one-class SVM 3 -- epsilon-SVR 4 -- nu-SVR During the cross validation process, k-fold cross validation is done. Data are randomly separated to k groups. Each time k1 group is selected for training and one for testing. By training the datasets, a classifier will be generated. This classifier will produce a pattern which is used for the prediction purpose. The prediction value is calculated and then stored . After the training process, the samples from rest of the datasets are used for prediction (testing) purpose. LIBSVM functions are used for these testing samples also. The classifier previously generated during the training phase is used to classify the class during the prediction phase. The prediction accuracy is calculated by performing 10-fold crossvalidation testing. The accuracy can be improved by training and testing the machine with different sample inputs of datasets and by selecting different kernels and SVM models during each iterations. The roles of svmtrain and svmpredict functions are as shown below: model = svmtrain (training_label_vector, training_instance_matrix [, 'libsvm_options'] ) ; [predicted_label, accuracy, decision_values/prob_estimates] = svmpredict (testing_label_vector, testing_instance_matrix, model [, 'libsvm_options']) ; The 'svmtrain' function returns a model which can be used for future prediction. It is a structure and is organized as [18]: [Parameters, nr_class, totalSV, rho, Label, ProbA, ProbB, nSV, sv_coef, SVs]: -Parameters: parameters -nr_class: number of classes; = 2 for regression/one-class svm -totalSV: total #SV -rho: -b of the decision function(s) wx+b -Label: label of each class; empty for regression/one-class SVM -ProbA: pairwise probability information; empty if -b 0 or in one-class SVM -ProbB: pairwise probability information; empty if -b 0 or in one-class SVM -nSV: number of SVs for each class; empty for regression/one-class SVM -sv_coef: coefficients for SVs in decision functions -SVs: support vectors The function 'svmpredict' has three outputs. The first one, predictd_label, is a vector of predicted labels. The second output, accuracy, is a vector including accuracy (for classification), mean squared error, and squared correlation coefficient (for regression). The third is a matrix containing decision values or probability estimates (if '-b 1' is specified). If k is the number of classes, for decision values, each row includes results of predicting k(k-1)/2 binary-class SVMs. For probabilities, each row contains k values indicating the probability that the testing instance is in each class. Note that the order of classes here is the same as 'Label' field in the model structure [18].
VIII.
ANALYSIS AND RESULT
The dataset of the Heart Disease is formulated in the form of confusion matrix. A confusion matrix is a specific table layout that allows visualization of the performance of an algorithm, in classification task, in a supervised learning one. Each column of the matrix represents the instances in a predicted class and each row represents the instances in an actual class. The confusion matrix does the prediction ability of the classifier in correctly classifying the positive and negative classes of the dataset. The True Positives (TP) and True Negatives (TN) does the correct classification in the yes/no class (predicted class). Similarly, False Negatives (FN) and False Positives (FP) represent the incorrect class. FN occurs when the outcome is incorrectively
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) predicted as no, when it is actually yes. FP occurs when the outcome is incorrectively predicted as yes, when it is actually no. The level of accuracy of the classification model is calculated with the number of correct and incorrect classifications, classified in the confusion matrix. The confusion matrix will be tabulated as shown in Table II. TABLE II A CONFUSION MATRIX Actual Class Yes = (count) No = (count) Predicted Class Yes No False True positive negative False positive True negative
The accuracy of the SVM classifier on the particular dataset is calculated using a 10-fold cross-validation test. The cross-validation test is done by breaking a dataset into 10 pieces and performing the prediction test on that single piece from the remaining 90% of data. The classification accuracy is then calculated from the average of the 10 predictive accuracy values. The Accuracy for the predicted class is calculated as given below:
---- (10)
IX. ANALYSIS GRAPH
The analysis graph for different datasets is shown below. The datasets chosen are Hungarian dataset, Cleveland dataset, Switzerland dataset and Long-Beach-va dataset.
ACKNOWLEDGMENT
I place on record, my sincere gratitude to The Almighty God, for giving me the grace for the completion of my work beyond my imagination. I extend my deep sense of gratitude to my internal guide, Dr. T. Ravi (Head of the Department, Computer Science), for their cooperation and guidance for preparing and presenting this project. I also extend my sincere thanks to the principal, Dr. T. Rengaraja for his support and facilities provided. I also extend my heartfelt thanks to all the staffs, my friends and my family.
X.
CONCLUSION AND FUTURE WORK
This work stress on the feature selection method in medical decision support tools to improve the quality of diagnosis of the disease. By reducing the number of features, the numbers of medical measures to be made are also reduced. In this work the approach made is to identify the features of the medical dataset that will improve the performance of the classifier in accurate classification. The SVM and the Radial Kernel Basis Function used to improve the training process of the expert system. The dataset for the Heart Disease is collected and used to train the machine. SVM does to classification process for the linear vectors and Radial Kernel Basis Function does the classification process for the non-linear vectors. Through this, the accuracy and efficiency of the machine for classification is improved. By finding out the heart disease using the machine by training it, the results thus obtained is compared with the manual findings. By making the machine more accurate, it can be made beneficial to the society by making it a cost effective approach. As a future research work, this machine can be synchronized with artificial sensors to find out the disease of the patient. Also, integer coded genetic algorithm can be used with LS-SVM to improve the efficiency of the work.
REFERENCES
[1] Onana Frunza, Diana Inkpen, And Thomas Tran, A Machine-learning Approach for Identifying Disease Treatment Relations In Short Text, IEEE Transactions on Knowledge and Data Engineering, Vol 23, No. 6, June 2011. [2] Sheikh Abdul Hannan, V. D. Bhagile, R. R. Manza, R. J. Ramteke, Diagnosis and Medical Prescription of Heart Disease using Support Vector Machine and Feedforward Backpropogation Technique, (IJCSE) International Journal on Computer Science and Engineering, Vol. 02, No. 06, 2150-2159, 2010. [3] Sarojini Balakrishnan, Ramaraj Narayanaswamy, Ilango Paramasivam, An Empirical Study on the Performance of Integrated Hybrid Prediction Model On the Medical Datasets, International Journal on Computer Applications, 09758887, Vol 29- No. 5, Sept 2011. [4] Dilly Ruth, 2002. Data Mining An Introduction. Available at http://www.pcc.qub.ac.uk/tec/courses/d atamining/stu_ notes /dm_book_1.html. [5] Lavrac N, Selected techniques for data mining in medicine. Artificial Intelligence in Medicine 16(1), 3-23, 1999.
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) [6] I. Turkoglu, A. Arslan, E. Ilkay, An expert system for diagnosis of the heart valve diseases, Expert Systems with Applications vol.23, pp. 229236, 2002. [7] I. Turkoglu, A. Arslan, E. Ilkay, An intelligent system for diagnosis of heart valve diseases with wavelet packet neural networks, Computer in Biology and Medicine vol.33 pp.319331, 2003. [8] E. Comak, A. Arslan, I. Turkoglu, A decision support system based on support vector machines for diagnosis of the heart valve diseases, Computers in Biology and Medicine vol.37, pp.2127, 2007. [9] H. Uguz, A. Arslan, I. Turkoglu, A biomedical system based on hidden Markov model for diagnosis of the heart valve diseases, Pattern Recognition Letters vol.28 pp.395404, 2007. [10] A. Sengur, An expert system based on principal component analysis, artificial immune system and fuzzy k-NN for diagnosis of valvular heart diseases, Computers in Biology and Medicine vol.38 pp.329 338, .2008. [11] A. Sengur, An expert system based on linear discriminant analysis and adaptive neuro-fuzzy inference system to diagnosis heart valve diseases, Expert Systems with Applications vol.35 pp.214222,2008. [12] J. Galindo, P. Tamayo, Credit risk assessment using statistical and machine learning: basic methodology and risk modeling applications, Computational Economics 15 (1 2) (2000) 107143. [13] V. Vapnik, The Nature of Statistical Learning Theory, Springer- Verlag, New York, 1995. [14] M.A. Hearst, S.T. Dumais, E. Osman, J. Platt, B. Scholkopf, Support vector machines, IEEE Intelligent Systems 13 (4) (1998) 18 28. [15] E. P. Ephzibah, Cost Effective Approach on Feature Selection using Genetic Algorithms and LS-SVM Classifier., IJCA Special Issue on Evolutionary Computation for Optimization Techniques, ECOT 2010. [16] F.E.H. Tay, L.J. Cao, Modified support vector machines in financial time series forecasting, Neurocomputing 48 (2002) 847 861. [17] T. Van Gestel, J.A.K. Suykens, D.-E. Baestaens, A. Lambrechts, G. Lanckriet, B. Vandaele, B. De Moor, J. Vandewalle, Financial time series prediction using least squares support vector machines within the evidence framework, IEEE Transactions on Neural Networks 12 (4) (2001) 809 821. http://www.csie.ntu.edu.tw/~b910
Detecting Malicious Packet Losses

SATHIYA PRIYA P SINDHU C PG student,Department of CSE VELAMMAL ENGINEERING COLLEGE,CHENNAI
AbstractIn this paper, we consider the problem of detecting whether a compromised router is maliciously manipulating its stream of packets. In particular, we are concerned with a simple yet effective attack in which a router selectively drops packets destined for some victim. Unfortunately, it is quite challenging to attribute a missing packet to a malicious action because normal network congestion can produce the same effect. Modern networks routinely drop packets when the load temporarily exceeds their buffering capacities. Previous detection protocols have tried to address this problem with a user-defined threshold: too many dropped packets imply malicious intent. However, this heuristic is fundamentally unsound; setting this threshold is, at best, an art and will certainly create unnecessary false positives or mask highly focused attacks. We have designed, developed, and implemented a compromised router detection protocol that dynamically infers, based on measured traffic rates and buffer sizes, the number of congestive packet losses that will occur. Once the ambiguity from congestion is removed, subsequent packet losses can be attributed to malicious actions. We have tested our protocol in Emulab and have studied its effectiveness in differentiating attacks from legitimate network behavior. Index TermsInternet dependability, intrusion detection and tolerance, distributed systems, reliable networks, malicious routers.
Organized by: Department of Computer Science and Engineering , Anand Institute of Higher Technology, Chennai. www.iconic12.in E-mail: iconic.aiht@gmail.comPage 51
Intoduction THE Internet is not a safe place. Unsecured

hosts can expect to be compromised within minutes of connecting to the Internet and even well-protected hosts may be crippled with denial-of-service (DoS) attacks. However, while such threats to host systems are widely understood, it is less well appreciated that the network infrastructure itself is subject to constant attack as well. Indeed, through combinations of social engineering and weak passwords, attackers have seized control over thousands of Internet routers [1], [2]. Even more troubling is Mike Lynns controversial presentation at the 2005 Black Hat Briefings, which demonstrated how Cisco routers can be compromised via simple software vulnerabilities. Once a router has been compromised in such a fashion, an attacker may interpose on the traffic stream and manipulate it maliciously to attack others selectively dropping, modifying, or rerouting packets. Several researchers have developed distributed protocols to detect such traffic manipulations, typically by validating that traffic transmitted by one router is received unmodified by another [3], [4]. However, all of these schemesincluding our ownstruggle in interpreting the absence of traffic. While a packet that has been modified in transit represents clear evidence of tampering, a missing packet is inherently ambiguous: it may have been explicitly blocked by a compromised router or it may have been dropped benignly due to network congestion. In fact, modern routers routinely drop packets due to bursts in traffic that exceed their buffering capacities, and the widely used Transmission Control Protocol (TCP) is designed to cause such losses as part of its normal congestion control behavior. Thus, existing traffic validation systems must inevitably produce false positives for benign events and/or produce false negatives by failing to report real malicious packet dropping.
In this paper, we develop a compromised router detection protocol that dynamically infers the precise number of congestive packet losses that will occur. Once the congestion ambiguity is removed, subsequent packet losses can be safely attributed to malicious actions. We believe our protocol is the first to automatically predict congestion in a systematic manner and that it is necessary for making any such network fault detection practical. In the remainder of this paper, we briefly survey the related background material, evaluate options for inferring congestion, and then present the assumptions, specification, and a formal description of a protocol that achieves these goals. We have evaluated our protocol in a small experimental network and demonstrate that it is capable of accurately resolving extremely small and fine-grained attacks. Background There are inherently two threats posed by a compromised router. The attacker may subvert the network control plane (e.g., by manipulating the routing protocol into false route updates) or may subvert the network data plane and forward individual packets incorrectly. The first set of attacks have seen the widest interest and the most activitylargely due to their catastrophic potential. By violating the routing protocol itself, an attacker may cause large portions of the network to become inoperable. Thus, there have been a variety of efforts to impart authenticity and consistency guarantees on route update messages with varying levels of cost and protection [5], [6], [7], [8], [9], [10]. We do not consider this class of attacks in this paper. Instead, we have focused on the less wellappreciated threat of an attacker subverting the packet forwarding process on a compromised router. Such an attack presents a wide set of
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) opportunities including DoS, surveillance, manin-the-middle attacks, replay and insertion attacks, and so on. Moreover, most of these attacks can be trivially implemented via the existing command shell languages in commodity routers. The earliest work on fault-tolerant forwarding is due to Perlman [11] who developed a robust routing system based on source routing, digitally signed route-setup packets, and reserved buffers. While groundbreaking, Perlmans work required significant commitments of router resources and high levels of network participation to detect anomalies. Since then, a variety of researchers have proposed lighter weight protocols for actively probing the network to test whether packets are forwarded in a manner consistent with the advertised global topology [5], [12], [13]. Conversely, the 1997 WATCHERS system detects disruptive routers passively via a distributed monitoring algorithm that detects deviations from a conservation of flow invariant [14], [3]. However, work on WATCHERS was abandoned, in part due to limitations in its distributed detection protocol, its overhead, and the problem of ambiguity stemming from congestion [15]. Finally, our own work broke the problem into three pieces: a traffic validation mechanism, a distributed detection protocol, and a rerouting countermeasure. In [16] and [4], we focused on the detection protocol, provided a formal framework for evaluating the accuracy and precision of any such protocol, and described several practical protocols that allow scalable implementations. However, we also assumed that the problem of congestion ambiguity could be solved, without providing a solution. This paper presents a protocol that removes this assumption. 3 INFERRING CONGESTIVE LOSS In building a traffic validation protocol, it is necessary to explicitly resolve the ambiguity around packet losses. Should the absence of a given packet be seen as malicious or benign? In practice, there are three approaches for addressing this issue: Static Threshold. Low rates of packet loss are assumed to be congestive, while rates above some predefined threshold are deemed malicious. . Traffic modeling. Packet loss rates are predicted as a function of traffic parameters and losses beyond the packet counters in each router to detect if traffic flow is not conserved from source to destination. When a packet arrives at router r and is forwarded to a destination that will traverse a path segment ending at router x, r increments an outbound counter associated with router x. Conversely, when a packet arrives at router r, via a path segment beginning with router x, it increments its inbound counter associated with router x. Periodically, router x sends a copy of its outbound counters to the associated routers for validation. Then, a given router r can compare the number of packets that x claims to have sent to r with the number of packets it counts as being received from x, and it can detect the number of packet losses. Thus, over some time window, a router simply knows that out of m packets sent, n were successfully received. To address congestion ambiguity, all of these systems employ a predefined threshold: if more than this number is dropped in a time interval, then one assumes that some router is compromised. However, this heuristic is fundamentally flawed: how does one choose the threshold? In order to avoid false positives, the threshold must be large enough to include the maximum number of possible congestive legitimate packet losses over a measurement interval. Thus, any compromised router can drop that many packets without being detected. Unfortunately, given .
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) the nature of the dominant TCP, even small numbers of losses can have significant impacts. Subtle attackers can selectively target the traffic flows of a single victim and within these flows only drop those packets that cause the most harm. For example, losing a TCP SYN packet used in connection establishment has a disproportionate impact on a host because the retransmission time-out must necessarily be very long (typically 3 seconds or more). Other seemingly minor attacks that cause TCP timeouts can have similar effectsa class of attacks well described in [17]. All things considered, it is clear that the static threshold mechanism is inadequate since it allows an attacker to mount vigorous attacks without being detected. Instead of using a static threshold, if the probability of congestive losses can be modeled, then one could resolve ambiguities by comparing measured loss rates to the rates predicted by the model. One approach for doing this is to predict congestion analytically as a function of individual traffic flow parameters, since TCP explicitly responds to congestion. Indeed, the behavior of TCP has been excessively studied [18], [19], [20], [21], [22]. A simplified1 traffic over some period of time in order to amortize monitoring overhead over many packets. For example, one validation protocol described in [4] maintains where B is the throughput of the connection, RT T is the average round trip time, b is the number of packets that are acknowledged by one ACK, and p is the probability that a TCP packet is lost. The steady-state throughput of 1. This formula omits many TCP dynamics such as time-outs, slow start, delayed acks, and so forth. More complex formulas taking these into account can be found in literature. long-lived TCP flows can be described by this formula as a function of RT T and p. This formula is based on a constant loss probability, which is the simplest model, but others have extended this work to encompass a variety of loss processes [22], [20], [23], [24]. None of these have been able to capture congestion behavior in all situations. Another approach is to model congestion for the aggregate capacity of a link. In [25], Appenzeller et al. explore the question of How much buffering do routers need? A widely applied rule-of-thumb suggests that routers must be able to buffer a full delay bandwidth product. This controversial paper argues that due to congestion control effects, the rule-of-thumb is wrong, and the amount of required buffering is proportional to the square root of the total number of TCP flows. To achieve this, the authors produced an analytic model of buffer occupancy as a function of TCP behavior. We have evaluated their model thoroughly and have communicated with the authors, who agree that their model is only a rough approximation that ignores many details of TCP, including time-outs, residual synchronization, and many other effects. Thus, while the analysis is robust enough to model buffer size it is not precise enough to predict
stochastic model of TCP congestion control yields the following famous square root formula: prediction are deemed malicious. . Traffic measurement. Individual packet losses are predicted as a function of measured traffic load 1 B RT T 3 ; 2bp and router buffer capacity. Deviations from these predictions are deemed malicious. Most traffic validation protocols, including WATCHERS [3], Secure Traceroute [12], and our own work described in [4], analyze aggregate
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) congestive loss accurately. Hence, we have turned to measuring the interaction of traffic load and buffer occupancy explicitly. Given an output buffered first-in first-out (FIFO) router, congestion can be predicted precisely as a function of the inputs (the traffic rate delivered from all input ports destined to the target output port), the capacity of the output buffer, and the speed of the output link. A packet will be lost only if packet input rates from all sources exceed the output link speed for long enough. If such measurements are taken with high precision it should even be possible to predict individual packet losses. It is this approach that we consider further in the rest of this paper. We restrict our discussion to output buffered switches for simplicity although the same approach can be extended to input buffered switches or virtual output queues with additional adjustments (and overhead). Because of some uncertainty in the system, we cannot predict exactly which individual packets will be dropped. So, our approach is still based on thresholds. Instead of being a threshold on rate, it is a threshold on a statistical measure: the amount of confidence that the drop was due to a malicious attack rather than from some normal router function. To make this distinction clearer, we refer to the statistical threshold as the target significance level. 4 SYSTEM MODEL Our work proceeds from an informed, yet abstracted, model of how the network is constructed, the capabilities of the attacker, and the complexities of the traffic validation problem. In this section, we briefly describe the assumptions underlying our model. We use the same system model as in our earlier work [4]. 4.1 Network Model We consider a network to consist of individual homogeneous routers interconnected via directional point-to- point links. This model is an intentional simplification of real networks (e.g., it does not include broadcast channels or independently failing network interfaces) but is sufficiently general to encompass such details if necessary. Unlike our earlier work, we assume that the bandwidth, the delay of each link, and the queue limit for each interface are all known publicly. Within a network, we presume that packets are forwarded in a hop-by-hop fashion, based on a local forwarding table. These forwarding tables are updated via a distributed link-state routing protocol such as OSPF or IS-IS. This is critical, as we depend on the routing protocol to provide each node with a global view of the current network topology. Finally, we assume the administrative ability to assign and distribute cryptographic keys to sets of nearby routers. This overall model is consistent with the typical construction of large enterprise IP networks or the internal structure of single ISP backbone networks but is not well suited for networks that are composed of multiple administrative domains using BGP. At this level of abstraction, we can assume a synchronous network model. We define a path to be a finite sequence hr1 ; r2 ; . . . rn i of adjacent routers. Operationally, a path defines a sequence of routers a packet can follow. We call the first router of the path the source and the last router its sink; together, these are called terminal routers. A path might consist of only one router, in which case the source and sink are the same. Terminal routers are leaf routers: they are never in the middle of any path. An x-path segment is a consecutive sequence of x routers that is a subsequence of a path. A path segment is an x-path segment for some value of x > 0. For example, if a network consists of the single path ha; b; c; di, then hc; di and hb; ci are both two-path segments, but ha; ci is not
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) because a and c are not adjacent. 4.2 Threat Model As explained in Section 1, this paper focuses solely on data plane attacks (control plane attacks can be addressed by other protocols with appropriate threat models such as [6], [7], [5], [8], [9], and [10]). Moreover, for simplicity, we examine only attacks that involve packet dropping. However, our approach is easily extended to address other attackssuch as packet modification or reorderingsimilar to our previous work. Finally, as in [4], the protocol we develop validates traffic whose source and sink routers are uncompromised. A router can be traffic faulty by maliciously dropping packets and protocol faulty by not following the rules of the detection protocol. We say that a compromised router r is traffic faulty with respect to a path segment during if contains r and, during the period of time , r maliciously drops or misroutes packets that flow through . A router can drop packets without being faulty, as long as the packets are dropped because the corresponding output interface is congested. A compromised router r can also behave in an arbitrarily malicious way in terms of executing
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) The first problem we address is traffic validation: what information is collected about traffic and how it is used to determine that a router has been compromised. Consider the queue Q in a router r associated with the output interface of link hr; rd i (see Fig. 1). The neighbor routers rs1 ; rs2 ; . . . ; rsn feed data into Q. We denote with T inf or; Qdir ; ; the traffic information collected by router r that traversed path segment over time interval . Qdir is either Qin , meaning traffic into Q, or . D T inf ord ; Qout ; hr; rd i; is the traffic information about the outgoing traffic from Q collected at router rd . If routers rs1 ; rs2 ; . . . ; rsn and rd are not protocol faulty, then TV Q; qpred t; S; D evaluates to false if and only if r was traffic faulty and dropped packets maliciously during . T inf or; Qdir ; ; can be represented in different ways. We use a set that contains, for each packet traversing Q, a three-tuple that includes: a fingerprint of the packet, the packets size, and the time that the packet entered or exited Q (depending on whether Qdir is Qin or Qout ). For example, if at time t router rs transmits a packet of size ps bytes with a fingerprint fp, and the packet is to traverse , then rs computes when the packet will enter Q based on the packets transmission and propagation delay. Given a link delay d and link bandwidth bw associated with the link hrs ; ri, the time stamp for the packet is t d ps=bw. T V can be implemented by simulating the behavior of Q. Let P be a priority queue, sorted by increasing time stamp. All the traffic information S and D are inserted into P along with the identity of the set (S or D) from which the information came. Then, P is enumerated. For each packet in P with a fingerprint fp, size ps, and a time
Fig. 1. Validating the queue of an output interface. the protocol we present, in which case we indicate r as protocol faulty. A protocol faulty router can send control messages with arbitrarily faulty information, or it can simply not send some or all of them. A faulty router is one that is traffic faulty, protocol faulty, or both. Attackers can compromise one or more routers in a network. However, for simplicity, we assume in this paper that adjacent routers cannot be faulty. Our work is easily extended to the case of k adjacent faulty routers. 5 OL PROTOC
Protocol detects traffic faulty routers by validating the queue of each output interface for each router. Given the buffer size and the rate at which traffic enters and exits a queue, the behavior of the queue is deterministic. If the actual behavior deviates from the predicted behavior, then a failure has occurred. We present the failure detection protocol in terms of the solutions of the distinct subproblems: traffic validation, distributed detection, and response. 5.1 Traffic Validation
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) stamp ts, qpred is updated as follows. Assume t is the time stamp of the packet evaluated prior to the current one: . If fp came from D, then the packet is leaving Q : qpred ts : qpred t ps. . If fp came from S and fp 2 D, then the packet fp is entering and will exit: qpred ts : qpred t ps. . If fp came from S and fp 2 = D, then the packet fp is entering into Q and the packet fp will not be transmitted in the future: qpred ts is unchanged, and the packet is dropped.
qpred t is the predicted state of Q at time t. qpred t is initialized to 0 when the link hr; rd i is discovered and installed into the routing fabric. qpred is updated as part of traffic validation. . S f8i 2 f1; 2; . . . ; ng : T inf orsi ; Qin ; hrsi ; r; rd i; g, is a set of information about traffic coming into Q as collected by neighbor routers. error, qerror qact qpred , can be approximated with a normal distribution. Indeed, this turns out to be the case as we show in Section 7. Hence, this suggests using a probabilistic approach. We use two tests: one based on the loss of a single packet and one based on the loss of a set of packets. 2. The central limit theorem states the following. Consider a set of n samples drawn independently from any given distribution. As n increases, the average of the samples approaches a normal distribution as long as the sum of the samples has a finite variance.
If qlimit < qpred t ps, where qlimit is the buffer limit of Q, then the packet is dropped due to congestion. Otherwise, the packet is dropped due to malicious attack. Detect failure. In practice, the behavior of a queue cannot be predicted with complete accuracy. For example, the tuples in S and D may be collected over slightly different intervals, and so a packet may appear to be dropped when in fact it is not (this is discussed in Section 4.1). Additionally, a packet sent to a router may not enter the queue at the expected time because of short-term scheduling delays and internal processing delays. Let qact t be the actual queue length at time t. Based on 2 Qout , meaning traffic out of Q. At an abstract level, we the central limit theorem [26], our intuition tells us that the represent traffic, a validation mechanism associated with Q, as a predicate T V Q; qpred t; S; D, where 5.1.1 Single Packet Loss Test If a packet with fingerprint fp and size ps is dropped at time ts when the predicted queue
Fig. 2. Confidence value for single packet loss test for a packet with a fingerprint fp, size ps, and a time stamp ts.
length is qpred ts, then we raise an alarm with a confidence value csingle , which is the probability of the packet being dropped maliciously. csingle is computed as in Fig. 2.
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) The mean and standard deviation of X can be determined by monitoring during a learning period. We do not expect and to change much over time, because they are in turn determined by values that themselves do not change much over time. Hence, the learning period need not be done very often. A malicious router is detected if the confidence value csingle is at least as large as a target significance single level slevel .3 5.1.2 Combined Packet Losses Test The second test is useful when more than one packet is dropped during a round and the first test does not detect a malicious router. It is based on the well-known Z-test4 [26]. Let L be the set of n > 1 packets dropped during the last time interval. For the packets in L, let ps be the mean of the packet sizes, qpred be the mean of qpred ts (the predicted queue length), and qact be the mean of qact ts (the actual queue length) over the times the packets were dropped. We test the following hypothesis: The packets are lost due to malicious attack: > qlimit qpred ps. The Z-test scor e is short period the measurement was taken and most of the points, both for dropped packets and for nondropped packets, should have a q z limit But, this test is used when there are packets being dropped and the first test determined that they were consistent with congestion loss. Hence, the router is under load during the 3. The significance level is the critical value used to decide to reject the null hypothesis in traditional statistical hypothesis testing. If it is rejected, then the outcome of the experiment is said to be statistically significant with that significance level.
1
nearly full Q. In Section 7, we show that the Ztest does in fact detect a router that is malicious in a calculated manner. 5.2 Distributed Detection Since the behavior of the queue is deterministic, the traffic validation mechanisms detect traffic faulty routers whenever the actual behavior of the queue deviates from the predicted behavior. However, a faulty router can also be protocol faulty: it can behave arbitrarily with respect to the protocol, by dropping or altering the control messages of . We mask the effect of protocol faulty routers using distributed detection. Given T V , we need to distribute the necessary traffic information among the routers and implement a distributed detection protocol. Every outbound interface queue Q in the network is monitored by the neighboring routers and validated by a router rd such that Q is associated with the link hr; rd i. With respect to a given Q, the routers involved in detection are (as shown in Fig. 1) . rs , which sends traffic into Q to be forwarded. . r, which hosts Q. . rd , which is the router to which Qs outgoing traffic is forwarded. ; hr ; r; r i; . . r: Collect T inf or; Qin ; hrs ; r; rd i; . This information is used to check the transit traffic information sent by the r routers. s . rd : Collect T inf ord ; Qout ; hr; rd i; . 5.2.2 Information Dissemination and Detection
rs : At the end of each time interval , router rs
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) sends 4. The Z-test, whether which isthe a statistical test, is used to decide
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) difference between a sample mean and a given population mean is large enough to be statistically significant or not. T inf ors ; Qin ; hrs ; r; rd i; rs that it has collected. M x is a message M digitally signed by x. Digital signatures are required for integrity and authenticity against message tampering.5 1. D-I. r: Let be the upper bound on the time to forward traffic information. a. If r does not receive traffic information from r s within , then r detects hrs ; ri. b. Upon receiving T inf ors ; Qin ; hrs ; r; rd i; rs , router r verifies the signature and checks to see if this information is equal to its own copy T inf or; Qin ; hrs ; r; rd i; . If so, then r forwards it to rd . If not, then r detects hrs ; ri. At this point, if r has detected a failure hrs ; ri, then it forwards its own copy of traffic information T inf or; Qin ; hrs ; r; rd i; . This is required by rd to simulate Qs behavior and keep the state q up to date. 2. D-II. rd : a. If rd does not receive traffic information T inf ors ; Qin ; hrs ; r; rd i; originated by rs within 2 , then it expects r to have detected r as faulty and to announce this detection through the response mechanism. If r does not do this, then rd detects hr; rd i. b. After receiving the traffic information forwarded from r, rd checks the integrity and authenticity of the message. If the digital signature verification fails, then rd detects hr; rd i. c. Collecting all traffic information, router rd evaluates the T V predicate for queue Q. If T V evaluates to false, then rd detects hr; rd i. Fault detections D-Ia, D-Ib, D-IIa, and D-IIb are due to protocol faulty routers, and fault detection D-IIc is due to the traffic validation detecting traffic faulty routers. Note that dropping traffic information packets due to congestion can lead to false positives. Thus, the routers send this data with high priority. Doing so may cause other data to be dropped instead as congestion. Traffic validation needs to take this into account. It is not hard, but it is somewhat detailed, to do so in simulating Qs behavior. 5.3 nse Respo
Once a router r detects router r0 as faulty, r announces the link hr0 ; ri as being suspected.
This suspicion is disseminated via the distributed link state flooding mechanism of the routing protocol. As a consequence, the suspected link is removed from the routing fabric. Of course, a protocol faulty router r can announce a link hr0 ; ri as being faulty, but it can
do this for any routing protocol. And, in doing so, it only stops traffic from being routed through itself. Router r could even do this by simply crashing itself. To protect against such attack, the routing fabric needs to have sufficient path redundancy. 6 ANALYSIS PROTOCOL OF
In this section, we consider the properties and overhead of protocol . 5. Digital signatures can be replaced with message authentication codes if the secret keys are distributed among the routers. 6.1 Accuracy and Completeness In [4], we cast the problem of detecting compromised routers as a failure detector with accuracy and completeness properties. There are two steps in showing the accuracy and
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) completeness of : Showing that T V is correct. Showing that is accurate and complete assuming that T V is correct. Assuming that there exists no adjacent faulty routers, we show in Appendices B and C that if T V is correct, then is 2-accurate and 2-complete, where 2 indicates the length of detection: A link consisting of two routers is detected as a result. We discuss how to relax this assumption in Section 9.2. We discuss traffic validation in Section 6.2. 6.2 Traffic Validation Correctness Any failure of detecting malicious attack by T V results in a false negative, and any misdetection of legitimate behavior by T V results in a false positive. Within the given system model of Section 4, the example T V predicate in Section 5.1 is correct. However, the system model is still simplistic. In a real router, packets may be legitimately dropped due to reasons other than congestion: for example, errors in hardware, software or memory, and transient link errors. Classifying these as arising from a router being compromised might be a problem, especially if they are infrequent enough that they would be best ignored rather than warranting repairs the router or link. A larger concern is the simple way that a router is modeled in how it internally multiplexes packets. This model is used to compute time stamps. If the time stamps are incorrect, then T V could decide incorrectly. We hypothesize that a sufficiently accurate timing model of a router is attainable but have yet to show this to be the case. A third concern is with clock synchronization. This version of TV requires that all the routers feeding a queue have synchronized clocks. This requirement is needed in order to ensure that the packets are interleaved correctly by the model . . of the router. The synchronization requirement is not necessarily daunting; the tight synchronization is only required by routers adjacent to the same router. With low-level time stamping of packets and repeated exchanges of time [27], it should be straightforward to synchronize the clocks sufficiently tightly. Other representations of collected traffic information and T V that we have considered have their own problems with false positives and false negatives. It is an open question as to the best way to represent TV . We suspect any representation will admit some false positives or false negatives. 6.3 Overh ead We examined the overhead of protocol in terms of computing fingerprints, computing TV , per-router state, control messages overhead, clock synchronization, and key distribution. We believe all are low enough to permit practical implementation and deployment in real networks. 6.3.1 Computing Fingerprints The main overhead of protocol is in computing a fingerprint for each packet. This computation must be done at wire speed. Such a speed has been demonstrated to be attainable. In our prototype, we implemented fingerprinting using UHASH [28]. Rogaway [29] demonstrated UHASH performance of more than 1 Gbps on a 700-MHz Pentium III processor when computing a 4-byte hash value. This performance could be increased further with hardware support. Network processors are designed to perform highly parallel actions on data packets [30]. For example, Feghali et al. [31] presented an
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) implementation of well- known private-key encryption algorithms on the Intel IXP28xx network processors to keep pace with a 10Gbps forwarding rate. Furthermore, Sanchez et al. [32] demonstrated hardware support to compute fingerprints at wire speed of high speed routers (OC-48 and faster). 6.3.2 Computing T V The time complexity of computing T V depends on the size of the traffic information collected and received from the neighbors that are within two hops, and so it depends on the topology and the traffic volume on the network. If traffic information stores the packet fingerprints in order of increasing time stamps, then a straightforward implementation of traffic validation exists. In our prototype, which is not optimized, T V computation had an overhead of between 15 to 20 ms per validation round. 6.3.3 Per-Router State Let N be the number of routers in the network, and R be the maximum number of links incident on a router. Protocol requires a router to monitor the path segments that are at most two hops away. By construction, this is OR2 . State is kept for each of these segments. The TV predicate in Section 5.1 requires that a time stamp and the packet size be kept for each packet that traversed the path segment. As a point of comparison, WATCHERS [3] requires ORN state, where each individual router keeps seven counters for each of its neighbors for each destination. 6.3.4 Control Message Overhead Protocol collects traffic information and exchanges this information periodically using the monitored network infrastructure. Suppose we compute a 4-byte fingerprint and keep packet size and time stamp in 2 bytes each. Then, message overhead is 8 bytes per packet. If we assume that the average packet size is 800 bytes, then the bandwidth overhead of protocol is 1 percent. 6.3.5 Clock Synchronization Similar to all previous detection protocols, requires synchronization in order to agree on a time interval during which to collect traffic information. For a router r, all neighboring routers of r need to synchronize with each other to agree on when and for how long the next measurement
interval
will be.
Fig. 3. Simple topology.
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) Clock synchronization overhead is fairly low. For example, external clock synchronization protocol NTP [33] can provide accuracy within 200 s in local area networks (LANs). It requires two messages of size 90 bytes per transaction and the rate of transactions can be from once per minute to once per 17 minutes. Wedde et al. [34] presented an internal clock synchronization protocol (RTNP) that maintains an accuracy within 30 s by updating the clocks once every second. 6.3.6 Key Distribution To protect against protocol faulty routers tampering the messages containing traffic information, requires digital signatures or message authentication codes. Thus, there is an issue of key distribution, and the overhead for this depends on the cryptographic tools that are used. 7 EXPERIENCES
We have implemented and experimented with protocol in the Emulab [35], [36] testbed. In our experiments, we used the simple topology shown in Fig. 3. The routers were Dell PowerEdge 2850 PC nodes with a single 3.0-GHz 64-bit Xeon processor and 2 Gbytes of RAM, and they were running Redhat-Linux-9.0 OS software. Each router except for r1 was onnected to three LANs to which user machines were connected. The links between routers were configured with 3-Mbps bandwidth, 20-ms delay, and 75,000-byte capacity FIFO queue. Each pair of routers shares secret keys; furthermore, integrity and authenticity against the message tampering is provided by message authentication codes.
Fig. 5. Attack 1: Drop 20 percent of the selected Fig. 6. Attack 2: Drop the selected flows when the flows. (a) Queue length. (b) Statistical test results. queue is 90 percent full. (a) Queue length. (b) Statistical test results. Last, we looked in the SYN attack, which would prevent a selected host to establish a connection with any server: The router r1 was instructed to drop all SYN packets from a targeted host, which tries to connect to an ftp server. In Fig. 8, five SYN packets, which are marked with circles, are maliciously dropped by r1 . Except for the second SYN packet drop, all malicious drops raised an alarm. The second SYN is dropped when the queue is almost full, and so the confidence value is not significant enough to differentiate it from the other packet drops due to congestion. 7.4 Protocol versus Static
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) Threshold We argued earlier the difficulties of using static thresholds of dropped packets for detecting malicious intent. We illustrate this difficulty with the run shown in Fig. 6. Recall that during this run, the router dropped packets only when the output queue was at least 90 percent full. Before time 52, the router behaved correctly, and 2.1 percent of the packets were dropped due to congestion. During the time period from 52 to 64, the router maliciously dropped packets, but only 1.7 percent of the packets were dropped (some due to congestion and some due to the attack). This may seem counterintuitive: fewer packets were dropped due to congestion during the period that the queues contained more packets. Such a nonintuitive behavior does not happen in every run, but the dynamics of the network transport protocol led to this behavior in the case of this run. So, for this run, there is no static threshold that can be used to detect the period during which the router was malicious. A similar situation occurs in the highly focused SNY attack of Fig. 8. In contrast, protocol can detect such malicious behaviors because it measures the routers queues, which are determined by the dynamics of the network transport protocol. Protocol can report false positives and false negatives, but the probability of such detections can be controlled with a significance level for the statistical tests upon which is built. A static threshold cannot be used in the same way. 8 NONDETERMINISTIC QUEUING As described, our traffic validation technique assumes a deterministic queuing discipline on each router: FIFO with tail-drop. While this is a common model, in practice, real router implementations can be considerably more complexinvolving switch arbitration, multiple layers of buffering, multicast scheduling, and so forth. Of these, the most significant for our purposes is the nondeterminism introduced by active queue management (AQM), such as random early detection (RED) [37], proportional integrator (PI) [38], and random exponential marking (REM) [39]. In this section, we describe how protocol can be extended to validate traffic in AQM environments. We focus particularly on RED, since this is the most widely known and widely used of such mechanisms.6
Fig. 7. Attack 3: Drop the selected flows when the Fig. 8. Attack 4: Target a host trying to open a queue is 95 percent full. (a) Queue length. (b) connection by dropping Statistical test results. SYN packets. (a) Queue length. (b) Statistical test results. RED was first proposed by Floyd and Jacobson where qact is the actual queue size, and w is the in the early 1990s to provide better feedback weight for a low-pass filter. RED uses three more parameters: qth , min for end-to-end congestion control mechanisms. minimum threshUsing RED, when a qth routers queue full enough that old; congestion may becomes max , maximum threshold; and pmax , maximum probbe imminent, a packet is selected at random to ability. Using qavg , RED dynamically computes a dropping signal this condition back to the sending host. This probability in two steps for each packet it receives. signal can take the form of a bit marked in the First, it packets header and then echoed back to the computes an interim probability, pt : senderExplicit Congestion Notification (ECN) 8 th [40], [41]or can be indicated by dropping the 0 if qav < qm > < 7 in packet. If g t q ECN to signal congestion, then a mi h th th q protocolis , used as vg if q p p n max m t ax min < qavg < qmax max qth Organized by: Department of Computer Science and Engineering , Anand Institute of Higher Technology, Chennai. www.iconic12.in E-mail: iconic.aiht@gmail.comPage 66
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) t q presented in Section 5, works perfectly. If not, then h min RED will 1 if q th < qavg : introduce nondeterministic packet losses that may be misinterpreted as malicious activity. Further, the RED algorithm tracks the number of In the remainder of this section, we explain how packets, REDs cnt, since the last dropped packet. The final packet selection algorithm works, how it may be accommodated into our traffic dropping probability, p, is specified to increase validation framework, and how well we can slowly as cnt increases: detect even small attacks in a RED pt environment. p : 2 8.1 Random Early Detection cnt t RED monitors the average queue size, qavg , 1 min based on an exponential weighted moving Finally, instead of generating max a new random average: number for every packet when < qavg < , a qavg : 1 wqavg w qact 1 qth qth suggested optimization is to only generate random numbers when a 7. ECN-based marking is well known to be packet is dropped [37]. Thus, after each REDa superior signaling mechanism [42], [43]. induced packet drop, a new random sample, rn, However, while ECN is supported by many is taken from a uniform random variable R routers (Cisco and Juniper) and end-systems Random0; 1 . The first packet whose p value is (Windows Vista, Linux, Solaris, NetBSD, and so larger than rn is then dropped, and a new forth) it is generally not enabled by default, and random sample is taken. thus, it is not widely deployed in todays Internet [44], [45].
Fig. 9. A set of n packets. Each packet fpi is associated with a drop probability pi , and the outcome is either transmitted (TX) or dropped (DR) based on the random number generated during the last packet drop. 8.2 Traffic Validation for RED Much as in Section 5.1, our approach is to predict queue sizes based on summaries of their inputs from neighboring routers. Additionally, we track how the predicted queue size impacts the likelihood of a RED-induced drop and use this to drive two additional tests: one for the uniformity of the randomness in dropping packets and one for the distribution of packet drops among the flows.8 In effect, the first test is an evaluation of whether the distribution of packet losses can be explained by RED and tail-drop congestion alone, while the second evaluates if the particular pattern of losses (their assignment to individual flows) is consistent with expectation for traffic load. 8.2.1 Testing the Uniformity Packet Drops In Fig. 1, router rd monitors the queue size of router r and detects whether each packet is dropped or transmitted. Given the RED algorithm and the parameters, rd now can estimate qavg , the average queue size in (1); cnt, the count since the last dropped packet; and finally p, the dropping probability in (2) for each packet as in Fig. 9. All of these computations are packet losseseven if not suspicious in their overall numberis anomalous with respect to perflow traffic load. This test requires per-flow state in order to count the number of received packets and dropped packets min max per flow during < qavg < . Once again, we use the Chiqth square qth test to evaluate the distribution of packet losses to flows.10 Once the Chi-square value is computed, the corresponding critical value can be used as the confidence value cdrop=f low to reject the hypothesis, which means that the distribution of packet drops among the flows is not as drop=f low expected. A malicious router is detected if the confidence value cdrop=f low is at least a target significance level slevel . 8.3 Experience s We have experimented with protocol with this new traffic validation in a RED environment using the same setup as presented limit in Section 7. The capacity of the queue, qth
, is 75,000 bytes. In addition, the RED parameters, as in Section 8.1, are configured as follows: the weight for the low-pass filter is w 0:5, the minimum threshold is min qth 30;000 bytes, the maximum threshold deterministic and based on observed inputs. th60;000 bytes, and the maximum The router r drops a packet fpi if its pi value is maxq probability is exceeds the 11 . fp3 is dropped: p2 < rn1 < p3 . random number rnx that it generated at the . fp8 is dropped: p7 < rn2 < p8 . most recent packet drop. So, rd expects that rnx Since each packet drop should be a sample of a is between pi 1 and pi . For example in Fig. 9 uniform random distribution, we can detect Organized by: Department of Computer Science and Engineering , Anand Institute of Higher Technology, Chennai. www.iconic12.in E-mail: iconic.aiht@gmail.comPage 68
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) deviations from this process via statistical hypothesis testing. In particular, we use the Chisquare test to evaluate the hypothesis that the observed packet losses are a good match for a uniform distribution [26]. Once the Chi-square value9 is computed, then the corresponding pmax 0:02. For the packet drop uniformity test, a window of 30 packet drops is used. The distribution of packet drops to flows test examines a window of 15 seconds. Experimentally, we find that smaller windows lead to false positives, but larger windows do not improve the results notably. A more sophisticated version of our algorithm could adapt the window size in response to load in order to ensure a given level of confidence. 8.3.1 Experiment 1: False Positives The result of one run is shown in Fig. 10a. qavg is the predicted average queue length of Q computed by router r2 . Packet losses are also marked with triangles. The corresponding confidence values can be seen in Fig. 10b. We executed protocol under high traffic load for more than half an hour. With significance levels aggressively chosen at slevel 0:999 drop=f and 0:999, we rando low mness slevel did not observe any false positives. 8.3.2 Experiment 2: Detecting Attacks Next, we examined how effectively protocol detects various attacks. In these experiments, router r1 is compromised to attack the traffic selectively in various ways, targeting ftp flows from a chosen subnet. The duration of 10. Short-lived flows with a few tens of packets are ignored unless the drop rate is 100 percent. Otherwise, a few packet drops from a short-lived flow lead to false detection. 11. Setting the parameters is inexact engineering. We used the guidelines presented in [37] and/or our intuition in selecting these values.
critical value can be used as the confidence value crandomness to reject the hypothesis, which means the outcome is a result of nonuniform distribution and/or a detection of malicious activity. Thus, a malicious router is detected if the confidence value crandomness is at
least a target significance level slevel randomness . 8.2.2 Testing the Distribution of Packet Drops among Flows One of the premises of RED [37] is that the probability of dropping a packet from a particular connection is proportional to that connections bandwidth usage. We exploit this observation to evaluate whether the particular pattern of 8. Consistent with our assumption that the network is under a single administrative domain (Section 4, we assume that all RED parameters are known). 9. P Chi-square Ei i1 Oi E k , where Oi is the observed frequency of bin i, Ei is the expected frequency of bin i, and k is the number of bins.
2 i
Fig. 10. Without attack. (a) Average queue length. Fig. 11. Attack 1: Drop the selected flows when the average queue size is above 45,000 bytes. (a) (b) Statistical test results. Average queue length. (b) Statistical test results. the attack is indicated with a line bounded by diamonds in the figures, and a detection is detect any anomaly. However, since these losses are focused on a small number of flows, indicated by a filled circle. For the first attack, router r1 drops the they are quickly detected using the second test. Finally, we explained a highly selective attack in packets of the selected flows for 30 seconds when which the average queue size computed by RED is the router r1 was instructed to only drop above 45,000 bytes. The predicted average queue size and the confidence values can be seen in Fig. TCP SYN packets from a targeted host, which 11. As shown in the graph, during the attack, tries to connect to an ftp server. In Fig. 15, four SYN packets, which are marked with circles, are protocol detects the failure successfully. As queue occupancy grows, the RED maliciously dropped by r1 . Since all the of the attacked flow are algorithm drops packets with higher probability observed packets and thus provides more cover for attackers dropped, which is statistically unexpected given still raises an to drop packets without being detected. We the RED algorithm, protocol explore this property in the second attack, in alarm. which router r1 was instructed to drop packets in the selected flows when the average queue 9 was at least ISSUE 54,000 bytes, which is very close to the maximum S threshold, max 60;000 bytes. As shown in Fig. 12, 9.1 Quality of Service qth protocol was still able to detect the attack and raised Real routers implement Quality of Service (QoS) alarms, except between 50 and 56 seconds. The providing preferential treatment to specified reason is that between traffic via several different traffic-handling 44 and 50 seconds the compromised router did techniques, such as traffic shap- ing, traffic not drop any packets maliciously. policing, packet filtering, and packet classificaIn the third and fourth attacks, we explore a tion. Given the configuration files, our work scenario in can be extended to handle these fairly complex which a router r1 only drops a small real-life functions, even those involving percentage of the packets in the selected flows. For nondeterminism, if the expected behavior of the example, during the third attack 10 percent of function can be modeled. packets are dropped (see Fig. 13) and 5 percent during the fourth attack (see Fig. 14). Even 9.2 Adjacent Faulty though relatively few packets are dropped, the Routers impact on TCP performance is quite high, We assume that there exists no adjacent faulty reducing bandwidth by between routers in our threat model for simplicity. This 30 percent and 40 percent. Since only a few assumption eliminates consorting faulty routers packets are that collude together to maliciously dropped, the packet drop uniformity test does not Organized by: Department of Computer Science and Engineering , Anand Institute of Higher Technology, Chennai. www.iconic12.in E-mail: iconic.aiht@gmail.comPage 70
Fig. 12. Attack 2: Drop the selected flows when the Fig. 13. Attack 3: Drop 10 percent of the average queue size is above 54,000 bytes. (a) selected flows when the average queue size is Average queue length. (b) Statistical test results. above 45,000 bytes. (a) Average queue length. (b) Statistical test results. produce fraudulent traffic information in order to hide their faulty behavior. However, it can be relaxed to the case of k > 1 adjacent faulty routers by monitoring every output interface of the neighbors k hops away and disseminating the traffic information to all neighbors within a diameter of k hops. This is the same approach that we used in [4], and it increases the overhead of detection. by that router. This assumption is well justified due to the fate-sharing argument and it is accepted by all of similar detection protocols. This assumption is necessary, in order to protect against faulty terminal routers that drop packets they receive from an end host or packets they should deliver to an end host. However, it also excludes DoS attacks wherein a faulty router introduces bogus traffic claiming that the traffic originates 9.3 Good Terminal from a legitimate end host. Yet, none of these Routers protocols explicitly address this problem. Of The path diversity within the network usually course, standard rate-limit scheme can be does not extend to individual hosts on LANs: applied against these kinds of DoS attacks. single workstations rarely have multiple paths to their network infrastructure. In these situations, 9.4 Other for fate-sharing reasons, there is little that can be s done. If hosts access router is compromised, then the host is partitioned and there is no routing Due to space limitation, in this paper, we do not remedy even if an anomaly is detected; the fate of discuss various issues, such as fragmentation, individual hosts and their access routers are multicast, multiple paths with equal cost, and directly intertwined. Moreover, from the transient inconsistencies of link- state routing. We standpoint of the network, such traffic originates refer the reader to our earlier work [4] for details. from a compromised router and therefore cannot 10 demonstrate anomalous forwarding behavior.12 To summarize, these protocols are designed CONCLUSION to detect anomalies between pairs of correct To the best of our knowledge, this paper is the first nodes, and thus for simplicity, it is assumed that serious attempt to distinguish between a router a terminal router is not faulty with respect to traffic dropping packets maliciously and a router originating from or being consumed dropping packets due to congestion. Previous work has approached this issue using a static user12. This issue can be partially mitigated by defined threshold, which is fundamentally extending our protocol to include hosts as well as limiting. Using the same framework as our earlier routers, but this simply pushes the problem to end work (which is based on a static user-defined hosts. Traffic originating from a compromised node threshold) [4], we developed can be modified before any correct node witnesses it.
Areas in Comm., Fig. 14. Attack 4: Drop five percent of the vol. 18, no. 4, pp. 582-592, Apr. 2000. Y.-C. Hu, A. Perrig, and D.B. Johnson, selected flows when the average queue size is [7] Ariadne: A Secure On- Demand Routing above 45,000 bytes. (a) Average queue length. (b) Protocol for Ad Hoc Networks, Proc. ACM MobiCom 02, Sept. 2002. Statistical test results. [8] B.R. Smith and J. Garcia-Luna-Aceves, Securing the Border Gateway Routing Protocol, Proc. IEEE Global a compromised router detection protocol that 1996. dynamically infers, based on measured traffic [9] Internet, Nov. S. Cheung, An Efficient Message rates and buffer sizes, the number of congestive Authentication Scheme for Link State Routing, Proc. 13th Ann. Computer Security Applications packet losses that will occur. Subsequent packet Conf. (ACSAC 97), pp. 90-98, 1997. losses can be attributed to malicious actions. [10] M.T. Goodrich, Efficient and Secure Network Because of nondeterminism introduced by Routing Algorithms, provisional patent filing, Jan. 2001. imperfectly synchronized clocks and scheduling [11] R. Perlman, Network Layer Protocols with Byzantine Robust- ness, PhD dissertation, MIT delays, protocol uses user-defined significance LCS TR-429, Oct. 1988. levels, but these levels are independent of the [12] V.N. Padmanabhan and D. Simon, Secure Traceroute to Detect Faulty or Malicious properties of the traffic. Hence, protocol does Routing, SIGCOMM Computer Comm. Rev., not suffer from the limitations of static vol. 33, no. 1, pp. 77-82, 2003. thresholds. [13] I. Avramopoulos and J. Rexford, Stealth Probing: Efficient Data- Plane Security for IP We evaluated the effectiveness of protocol Routing, Proc. USENIX Ann. Technical Conf. through an (USENIX 06), June 2006. implementation and deployment in a small [14] S. Cheung and K.N. Levitt, Protecting Routing Infrastructures from Denial of Service network. We show that even fine-grained attacks, Using Cooperative Intrusion Detection, Proc. such as stopping a host from opening a connection Workshop on New Security Paradigms (NSPW 97), pp. 94-106, by discarding the SYN packet, can be detected. 1997. REFERENCES [1] X. Ao, Report on DIMACS Workshop on LargeScale Internet Attacks, http://dimacs.rutgers.edu/Workshops/Attacks /internet-attack9-03.pdf, Sept. 2003. [2] R. Thomas, ISP Security BOF, NANOG 28, http://www.nanog. org/mtg0306/pdf/thomas.pdf, June 2003. [3] K.A. Bradley, S. Cheung, N. Puketza, B. Mukherjee, and R.A. Olsson, Detecting Disruptive Routers: A Distributed Network Monitoring Approach, Proc. IEEE Symp. Security and Privacy (S&P 98), pp. 115-124, May 1998. [4] A.T. Mizrak, Y.-C. Cheng, K. Marzullo, and S. Savage, Detecting and Isolating Malicious Routers, IEEE Trans. Dependable and Secure Computing, vol. 3, no. 3, pp. 230-244, JulySept. 2006. [5] L. Subramanian, V. Roth, I. Stoica, S. Shenker, and R. Katz, Listen and Whisper: Security Mechanisms for BGP, Proc. First Symp. Networked Systems Design and Implementation (NSDI 04), Mar. 2004. [6] S. Kent, C. Lynn, J. Mikkelson, and K. Seo, Secure Border Gateway Protocol (Secure-BGP), IEEE J. Selected Organized by: Department of Computer Science and Engineering , Anand Institute of Higher Technology, Chennai. www.iconic12.in E-mail: iconic.aiht@gmail.comPage 72
[15] J.R. Hughes, T. Aura, and M. Bishop, Using Conservation of Flow as a Security Mechanism in Network Protocols, Proc. IEEE Symp. Security and Privacy (S&P 00), pp. 131-132, 2000. [16] A. Mizrak, Y. Cheng, K. Marzullo, and S. Savage, Fatih: Detecting and Isolating Malicious Routers, Proc. Intl Conf. Dependable Systems and Networks (DSN 05), pp. 538-547, 2005. [17] A. Kuzmanovic and E.W. Knightly, Low-Rate TCP-Targeted Denial of Service Attacks: The Shrew versus the Mice and Elephants, Proc. ACM SIGCOMM 03, pp. 75-86, 2003. [18] M. Mathis, J. Semke, and J. Mahdavi, The Macroscopic Behavior of the TCP Congestion Avoidance Algorithm, SIGCOMM Computer Comm. Rev., vol. 27, no. 3, pp. 67-82, 1997. [19] J. Padhye, V. Firoiu, D. Towsley, and J. Kurose, Modeling TCP Throughput: A Simple Model and Its Empirical Validation, Proc. ACM SIGCOMM 98, pp. 303-314, 1998. [20] M. Yajnik, S.B. Moon, J.F. Kurose, and D.F. Towsley, Measure- ment and Modeling of the Temporal Dependence in Packet Loss, Proc. INFOCOM 99, pp. 345-352, 1999. [21] N. Cardwell, S. Savage, and T.E. Anderson, Modeling TCP Latency, Proc. INFOCOM 00, pp. 1742-1751, 2000. [22] E. Altman, K. Avrachenkov, and C. Barakat, A Stochastic Model of TCP/IP with Stationary Random Losses, Proc. ACM SIGCOMM 00, pp. 231-242, 2000. [23] W. Jiang and H. Schulzrinne, Modeling of Packet Loss and Delay and Their Effect on Real-Time Multimedia Service Quality, Proc. 10th Intl Workshop Network and Operating Systems Support for Digital Audio and Video (NOSSDAV), 2000. [24] T.J. Hacker, B.D. Noble, and B.D. Athey, The Effects of Systemic Packet Loss on Aggregate TCP Flows, Proc. ACM/IEEE Conf. Supercomputing (SC 02), pp. 1-15, 2002. [25] G. Appenzeller, I. Keslassy, and N. McKeown, Sizing Router Buffers, Proc. ACM SIGCOMM 04, pp. 281-292, 2004.
[26] R.J. Larsen and M.L. Marx, Introduction to Mathematical Statistics and Its Application, fourth ed. Prentice Hall, 2005. [27] K. Arvind, Probabilistic Clock Synchronization in Distributed Systems, IEEE Trans. Parallel and Distributed Systems, vol. 5, no. 5, pp. 474-487, May 1994. [28] J. Black, S. Halevi, H. Krawczyk, T. Krovetz, and P. Rogaway, UMAC: Fast and Secure Message Authentication, LNCS, vol. 1666, pp. 216-233, 1999. [29] P. Rogaway, UMAC Performance (More), http://www.cs.ucdavis. edu/~rogaway/umac/2000/perf00bis.ht ml, 2000. [30] N. Shah, Understanding Network Processors. Masters thesis, Univ. of California, Sept. 2001. [31] W. Feghali, B. Burres, G. Wolrich, and D. Carrigan, Security: Adding Protection to the Network via the Network Processor, Intel Technology J., vol. 6, pp. 40-49, Aug. 2002. [32] L.A. Sanchez, W.C. Milliken, A.C. Snoeren, F. Tchakountio, C.E. Jones, S.T. Kent, C. Partridge, and W.T. Strayer, Hardware Support for a Hash-Based IP Traceback, Proc. Second DARPA Information Survivability Conf. and Exposition (DISCEX II 01), pp. 146152, 2001. [33] D.L. Mills, Network Time Protocol (Version 3) Specification, Implementation, RFC 1305, IETF, Mar. 1992. [34] H.F. Wedde, J.A. Lind, and G. Segbert, Achieving Internal Synchronization Accuracy of 30 ms under Message Delays Varying More than 3 msec, Proc. 24th IFAC/IFIP Workshop Real- Time Programming (WRTP), 1999. [35] B. White et al., An Integrated Experimental Environment for Distributed Systems and Networks, Proc. Fifth Symp. Operat- ing System Design and Implementation (OSDI 02), pp. 255270, Dec. 2002. [36] EmulabNetwork Emulation Testbed, http://www.emulab.net, 2006. [37] S. Floyd and V. Jacobson, Random Early Detection Gateways for Congestion Avoidance, IEEE/ACM Trans. Networking (TON 93), vol. 1, no. 4, pp. 397-
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) 413, 1993. [38] C.V. Hollot, V. Misra, D.F. Towsley, and W. Gong, On Designing Improved Controllers for AQM Routers Supporting TCP Flows, Proc. INFOCOM 01, pp. 1726-1734, Apr. 2001. [39] S. Athuraliya, S. Low, V. Li, and Q. Yin, REM: Active Queue Management, IEEE Network, vol. 15, no. 3, pp. 48-53, 2001. [40] S. Floyd, TCP and Explicit Congestion Notification, ACM Computer Comm. Rev., vol. 24, no. 5, pp. 10-23, 1994. [41] K. Ramakrishnan, S. Floyd, and D. Black, The Addition of Explicit Congestion Notification (ECN) to IP, RFC 3168, IETF, 2001.
PERFORMANCE ANALYSIS OF MOBILITY MANAGEMENT SCHEMES USING RANDOM WALK MODEL IN WIMAX - WIRELESS MESH NETWORKS
1
K.Lingadevi 1, Mrs.M.P.Reena 2 PG Student M.E Communication Systems, Sri Venkateswara College Of Engineering, Chennai,India. 2 Assistant Professor,ECE., Sri Venkateswara College Of Engineering, Chennai,India.
Abstract
Efficient mobility management schemes based on random walk model for wireless mesh networks (WMNs) with the objective to reduce the total communication energy are proposed. Userbased mobility management schemes for WMNs, namely, the static anchor scheme and dynamic anchor scheme are analyzed .These schemes are analyzed using Wi-Fi and Wimax technology. Also, caching of location information of mobile user can be used to reduce the signaling energy incurred by proposed schemes. Random walk model is a random-based mobility model used in mobility management schemes for mobile communication systems. Mobility model is designed to describe the movement pattern of mobile users, and how their location, velocity and acceleration change over time. In random-based mobility models, the mobile nodes move randomly and freely without restrictions. The destination, speed and direction all are chosen randomly and independently of other nodes. Especially for mobile Internet applications characterized by large traffic asymmetry for which the downlink packet arrival rate is much higher than the uplink packet arrival rate. Mobility management schemes are implemented using Network Simulator-2.
1 INTRODUCTION
WIRELESS Mesh Networks (WMNs) are gaining rapidly growing interest in recent years, and are widely acknowledged as an innovative solution for next-generation wireless networks. Compared with traditional wireless and mobile networks, e.g., Wi-Fibased wireless networks and mobile IP networks, WMNs have the advantages of low cost, easy deployment, self-organization and self-healing, and compatibility with existing wired and wireless networks through the gateway/bridge function of mesh routers. A WMN consists of mesh routers and mesh clients [1]. Mesh routers are similar to ordinary routers in wired IP networks, except that they are connected via (possibly multichannel multiradio) wireless links. Mesh clients are wireless mobile devices, e.g., PDAs, smart phones, laptops, etc. A major
expected use of WMNs is as a wireless backbone for providing last-mile broadband Internet access [2] to mesh clients in a multihop way, through the gateway that is connected to the Internet. Because mesh clients may move within a WMN and change their points of attachment frequently, mobility management is a necessity for WMNs to function appropriately. Mobility management consists of location management and handoff management [3]. Location management keeps track of the location information of mesh clients, through location registration and location update operations. Handoff management maintains ongoing connections of mesh clients while they are moving around and changing their points of attachment. Mobility management has been studied intensively for cellular networks and mobile IP networks. A large variety of mobility management schemes and protocols have been
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) proposed for these types of networks over the past years. Comprehensive surveys of mobility management in cellular networks and mobile IP networks can be found in [3] and [4], respectively. Due to some significant differences in network architecture, however, mobility management schemes proposed for cellular networks and mobile IP networks are generally not appropriate for WMNs. The lack of centralized management facilities, e.g., HLR/VLR in cellular networks and HA/FA in mobile IP networks, makes a large portion of the schemes proposed for those types of networks not directly applicable to WMNs, as argued in [1]. Therefore, the development of new mobility management schemes, which take into consideration of the unique characteristics of WMNs, is interesting and important. Additionally, mobility management schemes that are on a per-user basis are highly desired. A per-user-based mobility management scheme can apply specific optimal settings to individual mobile users such that the overall network traffic incurred by mobility management is minimized. The optimal settings of each mobile user should depend on the users specific mobility and service patterns, and should be computationally easy to determine. In this paper, I develop two per-user-based mobility management schemes for WMNs, namely, the static anchor scheme and dynamic anchor scheme. Both schemes are based on pointer forwarding, i.e., a chain of forwarding pointers is used to track the current location of a mesh client. The optimal threshold of the forwarding chain length is determined for each individual mesh client dynamically based on the mesh clients specific mobility and service patterns. schemes for mobile communication systems. The mobility model is designed to describe the movement pattern of mobile users, and how their location, velocity and acceleration change over time. In random Walk model, the mobile nodes move randomly and freely without restrictions. A random walk model can be used to study the movements of an MC. A random walk model is used in Wireless Mesh Networks in order to reduce overall network traffic incurred by mobility management schemes. Because Mesh Clients randomly choose its path for routing data packets wherever in the network. Routing is done by AODV protocol. In the performance evaluation of a protocol for an network, the protocol should be tested under realistic conditions including, but not limited to, a sensible transmission range, limited buffer space for storage of messages, representative data traffic models, and realistic movement of mobile users. In this mobility model, a mobile node moves from its current location to a new location by randomly choosing a direction and speed in which to travel. The new speed and direction are both chosen from pre-defined ranges, respectively [min-speed, max-speed] and [0, 2*pi] respectively. Each movement in the Random Walk Mobility Model occurs in either a constant time interval t or a constant traveled d distance, at the end of which a new direction and speed are calculated.
2 RANDOM WALK MODEL

Random walk model is a routing-based mobility model used in mobility management
Fig 1: Directions of movement of mesh client in random walk model
3 SYSTEM MODEL
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) A wireless mesh network is a communications network made up of radio nodes organized in a mesh topology. Mesh networking (topology) is a type of networking where each node must not only capture and disseminate its own data, but also serve as a relay for other nodes, that is, it must collaborate to propagate the data in the network. A mesh network can be designed using a flooding technique or a routing technique. When using a routing technique, the message propagates along a path, by hopping from node to node until the destination is reached. To ensure all its paths' availability, a routing network must allow for continuous connections and reconfiguration around broken or blocked paths, using selfhealing algorithms. A mesh network whose nodes are all connected to each other is a fully connected network. A WMN consists of two types of nodes: mesh routers (MRs) and mesh clients (MCs). MRs are usually static and form the wireless mesh backbone of WMNs. Some MRs also serve as wireless access points (WAPs) for MCs. One or more MRs are connected to the Internet and responsible for relaying Internet traffic to and from a WMN, and such MRs are commonly referred to as gateways. In this paper, we assume that a single gateway exists in a WMN. in the gateway. For each MC roaming around in a WMN, an entry exists in the location database for storing the location information of the MC, i.e., the address of its anchor MR (AMR). The AMR of an MC is the head of its forwarding chain. With the address of an MCs AMR, the MC can be reached by following the forwarding chain. Data packets sent to an MC will be routed to its current AMR first, which then forwards them to the MC by following the forwarding chain. Packet delivery in the proposed schemes simply rely on the routing protocol used.. The idea behind pointer forwarding is minimizing the overall network signaling energy incurred by mobility management operations by reducing the number of expensive location update events. A location update event means sending to the gateway a location update message informing it to update the location database. With pointer forwarding, a location handoff simply involves setting up a forwarding pointer between two neighboring MRs without having to trigger a location update event. The forwarding chain length of an MC significantly affects the network traffic incurred by the mobility management and packet delivery, with respect to the MC. The longer the forwarding chain, the lower rate the location update event, thus the smaller the signaling overhead. However, a long forwarding chain will increase the packet delivery cost because packets have to travel a long distance to reach the destination. Therefore, there exists a trade-off between the signaling cost incurred by mobility management versus the service cost incurred by packet delivery. Consequently, there exists an optimal threshold of the forwarding chain length for each MC. In the proposed schemes, this optimal threshold denoted by K is determined for each individual MC
Fig 2 Architecture of Wireless Mesh Network In the proposed mobility management schemes, the central location database resides
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) dynamically, based on the MCs specific mobility and service patterns. path is maintained according to agreed-upon service guarantees.
4 MOBILITY MANAGEMENT
Mobility management components: i. ii. contains two
5 STATIC ANCHOR SCHEME

In the static anchor scheme, an MCs AMR remains unchanged as long as the length of the forwarding chain does not exceed the threshold K. 5.1 Location Handoff When an MC moves across the boundary of covering areas of two neighboring MRs, it deassociates from its old serving MR and reassociates with the new MR, thus incurring a location handoff. The MR it is newly associated with becomes its current serving MR. For each MC, if the length of its current forwarding chain is less than its specific threshold K, a new forwarding pointer will be set up between the old MR and new MR during a location handoff. On the other hand, if the length of the MCs current forwarding chain has already reached its specific threshold K, a location handoff will trigger a location update. During a location update, the gateway is informed to update the location information of the MC in the location database by a location update message. The location update message is also sent to all the active Intranet correspondence nodes of the MC. After a location update, the forwarding chain is reset and the new MR becomes the AMR of the MC. 5.2 Service Delivery 5.2.1 Internet Session Internet sessions initiated toward an MC always go through the gateway, i.e., they are always routed to the gateway first before they actually enter into the WMN.
Location management Handoff management
4.1 LOCATION MANAGEMENT Location management enables the system to track the locations of MCs between consecutive communications. It includes two major tasks. The first is location registration or location update, where the MC periodically informs the system to update relevant location databases with its up to-date location information. The second is call delivery, where the system determines the current location of the MC based on the information available at the system databases when a communication for the MC is initiated. These two tasks will be performed in wireless mesh networks. 4.2 HANDOFF MANAGEMENT Handoff management is the process by which an MC keeps its connection active when it moves from one access point to another. Handoff (or handover) management enables the network to maintain a users connection as the mobile terminal continues to move and change its access point to the network. The three-stage process for handoff, first involves initiation, where the user, a network agent, or changing network conditions identify the need for handoff. The second stage is new connection generation, where the network must find new resources for the handoff connection and perform any additional routing operations. The final stage is data-flow control, where the delivery of the data from the old connection path to the new connection
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) and replies with the location information of MC2, i.e., the address of the AMR of MC2. After the location search procedure, data packets sent from MC1 to MC2 can be routed directly to the AMR of MC2, which then forwards them to MC2 by following the forwarding chain.
6 DYNAMIC ANCHOR SCHEME

In the dynamic anchor scheme, the current forwarding chain of an MC will be reset due to the arrival of new Internet or Intranet sessions. The idea behind this scheme is to reduce the packet delivery cost by keeping the AMR of an MC close to its current serving MR. The handling of location handoffs in the dynamic anchor scheme is the same as in the static anchor scheme.
Fig. 3. The handling of location handoffs in the static anchor schemes Because the location database resides in the gateway, the gateway always knows the location information of an MC by performing queries in the location database. Therefore, routing an Internet session toward an MC is straight forward. Once the location information of an MC is known, i.e., the address of the MCs AMR is queried, the gateway can route data packets to the AMR, which then forwards them to the MC by following the forwarding chain. 5.2.2 Intranet Session Unlike Internet sessions, which always go through the gateway where the location database is located, an Intranet session initiated toward an MC within a WMN must first determine the location information of the destination MC through a location search procedure. Suppose a mesh client MC1 initiates an Intranet session toward another mesh client MC2. Upon receiving the new session request from MC1, the serving MR of MC1 (MR1) sends a location query for MC2s location information to the gateway, which performs the query in the location database
Fig. 4. The handling of location handoffs in the dynamic anchor schemes However, the mechanism of service delivery in the dynamic anchor scheme is significantly different from that in the static anchor scheme. 6.1 Service Delivery 6.1.1 Internet Session In the dynamic anchor scheme, when a new Internet session toward an MC arrives at the gateway, the gateway will not route the
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) session to the AMR of the MC immediately. Instead, a location search procedure is executed to locate the MCs current serving MR, which may be different from its AMR. similar to the one above is executed to locate the current serving MR of the destination MC.
Fig. 5 illustrates the location search procedure for newly arrived Internet sessions. Specifically, the gateway sends a location request message to the AMR of the MC, which forwards the location request to its current serving MR. Upon receiving the location request message, the MCs current serving MR sends a location update message to the gateway, announcing that it is the new AMR of the MC. When the gateway receives the location update message, it updates the location information of the MC in the location database, i.e., marking that the current serving MR of the MC becomes its new AMR. After the location search procedure, the forwarding chain is reset and subsequent data packets will be routed to the new AMR of the MC. The gain is that the routing path is shortened, thus reducing the packet delivery energy. 6.1.2 Intranet Session When a new Intranet session is initiated toward an MC, a location search procedure
Fig. 6 illustrates the location search procedure for newly arrived Intranet sessions. Let MC1 and MC2 denote the source mesh client and destination mesh client, respectively. When a new Intranet session initiated toward MC2 by MC1 arrives at the current serving MR of MC1 (MR1), MR1 sends a location request message to the gateway, which queries the location database and routes the location request message to the AMR of MC2, which forwards the location request message to MC2s current serving MR (MR2). Upon receiving the location request message, MR2 replies to the gateway with a location update message, announcing that it is the new AMR of MC2. The location information of MC2 in the location database is updated by the gateway after it receives the location reply. The updated location information of MC2 is sent to MR1 in response to the location request and the location search procedure is completed. After the location search procedure, subsequent data
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) packets will be routed to the new AMR of MC2 directly. Where E=Total Communication Energy, i=Mobility rate of each node, i=Arrival rate at each node. Network traffic is reduced by reducing network delay.
7 PERFORMANCE METRICS
We use the total communication energy incurred per time unit as the metrics for performance evaluation and analysis. The total communication energy includes the signaling energy of location handoff and update operations, the signaling energy of location search operations, and the packet delivery energy. For the static anchor scheme, the signaling energy of location search operations is incurred when a new Intranet session is initiated toward an MC. For the dynamic anchor scheme, the signaling cost of location search operations represents the energy for tracking the current serving MR of an MC and resetting the forwarding chain when new sessions are initiated toward an MC. In the following, we use Estatic and Edynamic to represent the total communication energy incurred per time unit by the static anchor scheme and dynamic anchor scheme, respectively. Elocation, Esearch, and Edelivery are used to represent the signaling energy of a location handoff operation, the signaling energy of a location search operation, and the energy to deliver a packet, respectively. Subscripts are associated with these cost terms. Specifically, subscript I and L denote Internet and Intranet sessions, respectively. Subscript s and d denote the static anchor scheme and dynamic anchor scheme, respectively. For the static anchor scheme, the total communication energy incurred per time unit is calculated as
PERFORMANCE ANALYSIS
Fig. 7 plots threshold K as a function of Mobility Rate in both schemes. It can be observed that for both schemes, the K decreases, as Mobility Rate decreases. This is because mobility rate decreases; thus a short forwarding chain is favorable to reduce the service delivery cost. It is also interesting to see that K in the static anchor scheme is always smaller than or equal to the one in the dynamic anchor scheme, due to resetting the forwarding chain of an MC upon new session arrival in the dynamic anchor scheme.
Fig 7 Threshold Vs Mobility Rate A Fig. 8,9 show network delay, energy between Wi-fi, Wimax using Static anchor scheme. Both are reduced while using Wimax technology compared to Wi-Fi technology.
Fig 10 Delay using Wi-fi, Wimax Energy using Wi-fi, Wimax
Fig 11
9 CONCLUSIONS
Fig 8 Delay using Wi-fi,Wimax Fig 9 Energy using Wi-fi,Wimax A Fig. 10,11 show network delay, energy between Wi-fi,Wimax using Dynamic anchor scheme. Both are reduced while using Wimax technology compared to Wi-Fi technology. Mobility Management Schemes namely, Static Anchor Scheme and Dynamic Anchor Scheme are implemented in Wireless Mesh Networks based on Random walk Model using Network Simulator-2(NS-2). In Static Anchor Scheme, Anchor Mesh Router remains unchanged for threshold K. The total communication energy is reduced. In Dynamic Anchor Scheme, Anchor Mesh Router changed for every movement of Mesh Client. The total communication energy is reduced than static anchor scheme. This Scheme performs well in typical network conditions. Comparative analysis for both schemes using Wi-Fi, Wimax are tabulated.
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) Mesh Networks, Proc. 50th IEEE Global Telecomm. Conf., pp. 5092-5096, Nov. 2007.no. 4, pp. 79-89, Aug. 2007. [7] J. Robinson and E. Knightly, A Performance Study of Deployment Factors in Wireless Mesh Networks, Proc. IEEE INFOCOM,pp. 2054-2062, May 2007. In future, Performance of Mobility Management Schemes namely, Static Anchor Scheme and Dynamic Anchor Scheme will be analyzed using Wimax technology for further reduce traffic, total communication energy. [8] I.F. Akyildiz, Y.-B. Lin, W.-R. Lai, and R.-J. Chen, A New Random Walk Model for PCS Networks, IEEE J. Selected Areas in Comm., vol. 18, no. 7, pp. 1254-1260, July 2000.
REFERENCES
[1] I.F. Akyildiz, X. Wang, and W. Wang, Wireless Mesh Networks:A Survey, Computer Networks, vol. 47, no. 4, pp. 445487, Mar 2005. [2] A. Raniwala and T.-c. Chiueh, Architecture and Algorithms for an ieee 802.11-Based Multi-Channel Wireless Mesh Network,Proc. IEEE INFOCOM, vol. 3, pp. 2223-2234, Mar. 2005. [3] I. Akyildiz, J. McNair, J. Ho, H. Uzunalioglu, and W. Wang,Mobility Management in Next-Generation Wireless Systems,Proc. IEEE, vol. 87, no. 8, pp. 13471384, Aug. 1999. [4] I. Akyildiz, J. Xie, and S. Mohanty, A Survey of Mobility Management in NextGeneration All-IP-Based Wireless Systems,IEEE Wireless Comm., vol. 11, no. 4, pp. 16-28, Aug. 2004. [5] D. Huang, P. Lin, and C. Gan, Design and Performance Study for Mobility Management Mechanism (WMM) Using Location Cache for Wireless Mesh Networks, IEEE Trans. Mobile Computing, vol. 7, no. 5, pp. 546-556, May 2008. [6] R. Huang, C. Zhang, and Y. Fang, A Mobility Management Scheme for Wireless Organized by: Department of Computer Science and Engineering , Anand Institute of Higher Technology, Chennai. www.iconic12.in E-mail: iconic.aiht@gmail.com Page 83
MOTION DETECTION USING RECOGNIZATION ALGORITHMS

P.Sivaprakash 1, Dr.C G.Ravichandran 2, 1 Research Scholar, Anna University of Technology, Madurai. 2 Principal, RVS College of Engg and Technology, Dindigul.
ABSTRACT
Motion can defined as to detect through measure change in speed or vector of an object or objects. Motion detection is extensively used in various ways in video surveillance application and to detect and track the human and human activities in real time video sequence. This paper is related to the extensive subject of motion detection and analysis in video surveillance of image sequence. Now a day in video surveillance application is used to detect multiple objects and monitor their activities are challenging task in indoor and outdoor environment. Consider spatio-temporal relationship among feature points, thereby enabling detection and classification of simple and complex human activities. In presence of a good number of real time problems such as, the problems are namely illumination changes, moving background and shadow detection. The robust algorithm is proposed for enhancing the accuracy and reliability of motion detection and classification methods to develop the real time video sequence in video surveillance. Its advantages are discussed and compared to the relative approaches for action recognition. The widely-used KTH human activity dataset demonstrated and they are the implemented state-of-the art methods.
Keywords
Motion analysis, Spatio-temporal features, video surveillance, action recognition, and video monitoring.
INTRODUCTION
Motion analysis is an important task within the field of computer vision. Human motion analysis addresses general common tasks, such as: person detection and tracking, activity classification, behavior interpretation and also person identification. Recognition is the identification of objects in an image. This process would probably start with image processing techniques such as noise removal, followed by (low-level) feature extraction to locate lines, regions and alignment areas with certain textures. Activity recognition aims to recognize the actions and goals of one or more agents from a series of observations on the agents' actions and the environmental conditions. Human activity recognition has become an important area of research in
computer vision in recent years. Recently, local spatio-temporal features are extensively used in action recognition tasks, because they are more robust to noise, occlusion, and geometric variations than global (or large-scale) features. It has gained a lot of attention because of its important application domains like video indexing, surveillance, human computer interaction, sport video analysis, intelligent environments etc. They have been developed into different types of algorithms for human activity recognition [4]-[20]. They discussed recognition algorithms in detail and apply them to specific groups of activities, including single person interaction, multiple interactions, and person vehicle interactions. Brief over view of core technologies has been discussed [1]; techniques are often sensitive to poor resolution, frame rate, drastic illumination change in
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) surveillance systems. Probability based methods for activity recognition from trajectories has been addressed using statistical models, e.g., hidden Markov models (HMMs),continuous random filed(CRFs),skip-chain(SCCRFs) [2],[3] published a topical review of activity recognition research, this extensive review focuses on activity recognition techniques in which only on-body sensors are used and is also aimed at score functions of activity recognition. The EPs (Emerging patterns) based classifier model uses a set of multi-attribute tests for each class of activity [3]. Consider the blob features including mean, variance of each blob, luminance. They combined the non-zero pixels in the feature image into blobs using connected component analysis method. Based on the blob appearance the activities can be recognized [4].They proposed human detector framework based on HOG features in state of art field detection systems[16] and they defined a new way of computing HOG features based on square block of histograms which is four time faster than[18] implementation based on the integral of histograms. They exploited this increased speed to apply an approximate Gaussian mask to the features in order to improve the feature quality and the corresponding detection rate. Histogram of gradients features are used to detect whether an image encompasses human beings or not. SVM was used to train the classifier on the features [17].They are considered as the triaxial accelerometer signals, the accelerometer is used to collect human motion acceleration data for classifying the different type of human activities[8][12]. The low calculation cost feature extension method for 3D accelerometer signals is used in human activity recognition. Haar like feature is used to state of the art face detector, for high performance and calculation efficiency [8]. They introduced FIS (Fuzzy Inference system) system andit can distinguish the motion patterns of its ability of decision making. Three different features including peak to peak amplitude, standard deviation, and correlation between axes are extracted from each axis of the accelerator as inputs to the fuzzy system [12]. They proposed fuzzy rule based approach to recognize human activities. Consider the motion-based and shape based features to recognize an activity, many activities remain unidentified as temporal information is discarded. They motivated to design a robust method that uses temporal information [13]. The recognition algorithm is based on the characterizing behavior in terms of spatiotemporal features [19], [6], [20], [14], and [7]. They have been implemented a new spatiotemporal detector and a number of cuboids descriptors were analyzed. The cuboids is extracted which contains the spatio-temporally windowed pixel values and then capable of dealing with low resolution and noisy data. Cuboids prototypes by clustering a large number of cuboids are extracted from the training data set, cluster using k-means algorithm. A cluster prototype is a very simple yet powerful method for reducing variability of the data while maintaining its richness [19].They proposed frame differencing method only for simple lowlevel visual features and motion captured ,recognition can be achieved accurately using optical flow method. Here introduced a novel weighted-sequence distance (WSD) measure for comparing the similarity between two sequences. Support vector machine (SVM) is used for classifying the activity in a coded video sequence. This approach considers both global and local spatial temporal structures. It does not include multiple actions/subjects and appearance variations due to view angle changes [6]. They are considered as local space-time features that capture local events in video sequence, frequency and moving patterns. Recognizinghuman action directly from image measurements, the image measurements in terms of optic flow or spatio-temporal gradients, recognition can be achieved using local measurements in terms of spatiotemporal interest points; Support Vector Machines (SVMs) are state-of-the-art large margin classifiers that combined with motion descriptors in terms of local features (LF) and feature histograms (HistLF)[20].They implemented the probability based models to handle noisy feature points arisen from dynamic background and moving camera. This is
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) achieved by using latent topic models such as the probabilistic Latent Semantic Analysis (PLSA) model and Latent Dirichlet Allocation (LDA). The proposed algorithm can also localize multiple actions in complex motion sequences containing multiple actions. Both PLSA and LDA methods give higher recognition performance [14].Multi-stage approach is proposed to detect and recognize the human activities. This approach is to identify and extract salient space-time volumes that exhibit smooth periodic motion. They presented an algorithm that accelerates sparse coding by recursively constructing basis vectors, adjusting only a fraction of the weights at any given time. They are implemented to more challenging video collections that feature a small range of background, significant clutter and intermittent occlusion only [7].They modeled sub event (primitive) detectors of a spatiotemporal model for sequential event changes. Specialized Viterbi algorithm is designed to learn and interference the targets sequential events and handle the event overlap simultaneously, for repetitive sequential human activities [5]. visual features for action detection. Section3 deals with the comparative analysis of existing classifiers. Section4 concludes the paper.
2. VISUAL FEATURES FOR ACTION DETECTION

Now a day, the growth of video-based human action detection technology has been reached its heights. The extraction of appropriate features is critical for action detection. Ideally, visual features are able to face the following challenges for robust action detection: i) Viewpoint variations of the camera. ii) Performing speed variations for different people. iii) Different anthropometry of the performers and their movement style variations. iv) Cluttered and moving backgrounds. In the earlier days, human actions were tracked and segmented from the videos to characterize actions and motion trajectories are popularly used to represent and recognize actions. Unfortunately, only limited success has been achieved because robust object tracking itself is a nontrivial task. Recently, interest point based video features show promising results in the action detection research. Such interest point-based video features do not require any foreground/background separation or human tracking. The resources for four types of interest-point based features are listed below: (a).Space-Time Interest Point (STIP) The features of space-time interest point (STIP) have been frequently used for action recognition. However, the detected interest points are usually quite sparse, and it is time consuming to extract STIP features for highresolution videos. (b).Scale-Invariant SpatioTemporal Interest Point (SISTIP) The next type of interest point features is called dense and scale-invariant spatiotemporal interest point (SISTIP), is compared to that of STIP features, the SSI-STIP features are scale-invariant (both spatially and temporally) and densely cover the video. The feature extraction is accelerated through the use
Input Video sequence
Visual Features
Human activity Classification
Human activity Detection
Figure 1 Block diagram of the Video Surveillance System They introduced the systematic model based approach to learn the nature of such temporal variations (time warps). This approach allows us to learn the space of time warps for each activity while simultaneously capturing other intra and inter class variations [11]. This paper is organized as follows: Section1 provides the introduction.Section2 explains
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) of approximate box-filter operations on an integral video structure. (c).Sparse Spatiotemporal Feature The sparse spatiotemporal features are usually denser than the STIP features. However, they do not retain the features at multiple scales. When compared to the local feature detection, the proposed feature descriptor is relatively simple. (d).Scale Invariant Feature Transformation (SIFT) The 3-D SIFT descriptor is similar to the scale invariant feature transformation (SIFT) descriptor. But the gradient direction for each pixel is a three-dimensional vector. It can work with any interest point detector. to the class of its nearest neighbor from a stored labeled reference set. The goal of designing a NN classifier is to maximize the classification accuracy while minimizing the sizes of both the reference and feature sets. It has been recognized that the editing of the reference set and feature selection must be simultaneously determined when designing the NN classifier with high classification power. Let us consider a set of n patterns of known classification {x1, x2. . . xn}, where it is assumed that each pattern belongs to one of the classes C1, C2, . . , Ci,. , CK. The NN classification rule that assigns a pattern x of unknown classification to the class of its nearest neighbor, where xi {x1, x2, xn} is defined to be the nearest neighbor of x if D (xi; x) = min {D (xl; x)}; l = 1, 2,.., n. (1) Here D is any distance measure definable over the pattern space. Since the aforementioned scheme employs the class label of only the nearest neighbor to x, this is known as the 1-NN rule. On-parametric classifiers have several very important advantages such as: i) can naturally handle a huge number of classes. (ii) Avoid over fitting of parameters, which is a central issue in learning based approaches. (iii) Require no learning/training phase. The nearest neighbor rule is quite simple, but very computationally intensive and covers only small region in data set.
Figure 2 Example of KTH dataset consists of six activities.
3. COMPARATIVE ANALYSIS
In this section the brief review with some of the existing classification approaches has been given. Here, we discuss about existing classification approaches such as: NN Classifier, K-nn Classifier, Rule Based Classifier, Naive Bayes Classifier, SVM classifier as following sub divisions. (a).NN Classifier: In [6], [19] the nearest neighbor (NN) classifier is commonly used due to its simplicity and effectiveness. In 1-nn rule, an input is assigned
Figure3 1-nn classifier model (i).K-nn Classifier: The k-nn classifier is a very simple nonparametric method for classification. Despite the simplicity of the algorithm, it performs very well and is an important standard method for classification. The k-nn classifier compute the
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) distance between two classes using some distance function d(x, y), where x,y are classes composed N of features, such that x= {x 1,. . .,xN}, y={y1,. . .,yN}. (2) The traditional K-nn classification limitations as follows: great calculation complexity, fully dependent on training set, and no weight difference between each class. Over a dependent class variable C with a small number of outcomes or classes, conditional on several feature variables F1 through Fn. The problem is that if the number of features n is large or when a feature can take on a large number of values, then basing such a model on probability tables is infeasible. We therefore reformulate the model to make it more tractable. Using Bayes' theorem, we write (4) The discussion so far has derived the independent feature model, that is, the naive Bayes probability model. The naive Bayes classifier combines this model with a decision rule. One common rule is to pick the hypothesis that is most probable; this is known as the maximum a posteriori or MAP decision rule. The corresponding classifier is the function classify defined as follows:
Figure4 K-nn classifier model (b).Rule Based Classifier: The Rule based classifier is one of the simplest classification methods. This approach utilizes a set of pre-determined rules to generate the final result. These rules consist of a collection of ifthen. clauses. Rule: (Condition) y where Condition is a conjunctions of attributes y is the class label (c).Naive Bayes Classifier: The Naive Bayes Classifier applies the Bayes theorem as the fundamental statistical principle when carrying out the classifications. Bayes theorem utilizes prior knowledge of the classes with new evidence gathered from subsequent input data. The following equation represents the compact form of the Bayes theorem which uses conditional probabilities: (3)
(5) An advantage of the Naive Bayes Classifier easy to implement, good results obtained in most of the cases.Disadvantages, Assumption: class conditional independence, therefore loss of accuracy and practically, dependencies exist among variables. (d).SVM classifier: In [14], [20] the Support Vector Machine (SVM) is a classification technique, which was applied with great success in many challenging non-linear classification problems and was successfully applied to large data sets. SVM classification tasks due to their strong theoretical foundation and good classification accuracies that have been demonstrated in a wide range of application domains. The SVM algorithm finds a hyperplane that optimally splits the training set. The optimal hyperplane can be distinguished by the maximum margin of separation between all training points and the hyperplane. Looking at a
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) two-dimensional problem we actually want to find a line that best separates points in the positive class from points in the negative class. Generally w will be scaled by ||w||. The training part the algorithm needs to find the normal vector w that leads to the largest b of the hyperplane.For calculating the SVM, we see that the goal is to correctly classify all the data. For mathematical calculations we have, [a] If Yi= +1; wxi b 1 [b] If Yi= -1; wxi + b 1 [c] For all i; yi (wi + b) 1 Figure5 SVM classifier model To understand the essence of SVM classification following four basic concepts such as: (i) the separating hyperplane, (ii) the maximum-margin hyperplane, (iii) the soft margin and (iv) the kernel function. The most observable drawback to the SVM algorithm, as described thus far, is that it apparently only handles binary classification problems.
Table 1 compares the reported performance of several approaches of activity recognition on the KTH dataset. Action Box Clap Wave Jog Run Walk 82 81 91 89 84 Mean 83.3 81.1 81 78.5 71.8
86 93 53 88 Niebles et al, 98 2008 80 86 69 89 Dean et al, 81 2009 81 83 70 73 Wang and Li, 88 2009 82 84 63 73 Dollar et al, 80 2005 60 74 60 55 Schuldt et al, 98 2004 Table 1 Represents the recognition accuracy to related work.
Figure6 Represents the several approaches for the human activity recognition methods.
4. CONCLUSION
This paper, gives the detailed review that has been done in terms of spatio temporal case. Here, discussed a motion detection approach to recognize human activities in the video sequences. Motion detection is extensively used in video surveillance application to detect and recognize the human and human activities. This paper presented various types of visual features and classifiers were discussed. They are considering spatio temporal feature points used for recognition simple and complex human activities. In Spatio temporal cases give the good results and to avoid in real time problems are namely illumination changes, moving background and shadow detection. Here, discussed and compared several classification methods for activity recognition in spatio and temporal case. This paper is related to KTH human activity dataset and used state-of-the art methods. Furthermore, proposes an adaptive based algorithm for computationally effective to detect and recognize the simple and complex human activities in real time video sequence. Blob Features Proceedings of the 2009 IEEE international conference on Multimedia and Expo,September 2009 .Url?sa=t&rct=j&q=human%20activity%20re cognition%20based%20on%20the%20blob%2 0features%E2%80%9D&source=web&cd=1& ved=0CCIQFjAA&url=http%3A%2F%2Fnlpr web.ia.ac.cn%2F2009papers%2Fgjhy%2Fgh4. pdf [5] M. S. Ryooand , and J. K. Aggarwal , Recognition of Repetitive Sequential Human Activity Proceedings of the IEEE International Conference on Computer Vision (ICCV), Kyoto, Japan,October2009.Url?sa=t&rct=j&q=recogn ition%20of%20repetitive%20sequential%20% 20human%20activity%E2%80%9D%20&sour ce=web&cd=1&ved=0CCcQFjAA&url=http% 3A%2F%2Fwww.research.ibm.com%2Fpeopl evision%2FSeqTR.pdf [6] Z. Wang and B. Li, Human Activity Encoding and Recognition Using Low-Level Visual Features, in Proceedings of the 21th International Joint Conference on Artificial Intelligence,2009. DOI: 10.1.1.150.3122 [7] Thomas dean and Rich Washington,Recursivesparse,SpatioTemporal Codingin Proceedings of the 11th International Symposium on Multimedia,2009. DOI:10.1109/ISM.2009.28 [8] YuyaHanai, Jun Nishimura AndTadahiro Kuroda Haar Like Filtering for Human Activity Recognition Using 3d Accelerometer Digital Signal Processing Workshop and 5th IEEE Signal Processing Education Workshop, 2009. DSP/SPE 2009. IEEE 13th, 2009. DOI:10.1109/DSP.2009.4786008 [9] Junsong Yuan and ZichengLiutechware: Video-Based Human Action Detection Resources, IEEE signal processing magazine September 2010. [10] Jong T.Lee Real Time Parking Detection In Outdoor Environments Using 1-D
REFFERENCE
[1] Joshua Candamo, Matthew, and Dmitry A Survey on Human Behavior-Recognition AlgorithmsIEEE Transactions on Intelligent Transportation Systems, vol. 11, no.1, march 2010. DOI: 10.1109/TITS.2009.2030963 [2] Enugu Kim, SumiHelal, and Diane Cook, Human Activity Recognition and Pattern Discovery IEEE Pervasive Computing, vol. 9, no. 1, pp. 48-53,January 2010. DOI:10.1109/MPRV.2010.7 [3] T. Guetal epSICAR: An Emerging Patterns Based Approach to Sequential, Interleaved And Concurrent Activity RecognitionProc. IEEE 7th Ann. Int'l Conf. Pervasive Computing and Comm. (PerCom 09), IEEECs Press, 2009 pp. 19. [4] JieYang,Jian Cheng, and Hanqing Lu Human Activity Recognition Based On the
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) Transformation IEEE Transactions on circuts and systems for video techmology,vol 19,no. 7, july 2009. DOI: 10.1109/TCSVT.2009.2020249 [11] Ashok veeraraghavan Rate Invariant Recognition of Humans And Their Activities IEEE Transactions on image processing, vol 18.no 6,june 2009. DOI:10.1109/TIP.2009.2017143 [12] Mohammad helmi, S.M.T. Almodarresi Human Activity Recognition Using a Fuzzy Inter Ferrence System IEEE 2009, Fuzzy Systems, 2009. FUZZ-IEEE 2009. IEEE International Conference onPages 20-24 Aug.2009. DOI:10.1109/FUZZY.2009.5277329 [13] Jyh-yeongchang,Fuzzy Rule Interference Based Human Activity RecognitionIEEE international symposium on intelligent control part of 2009 IEEE multiconference on systems and control saint Petersburg,Russia,july 8-10,2009. [14] J. Niebles, H. Wang, and L. Fei-Fei, Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words, International Journal of Computer Vision, vol. 79, no. 3, pp.299318, 2008. url?sa=t&rct=j&q=unsupervised%20learning %20of%20human%20action%20categories%2 0using%20spatialtemporal%20words&source=web&cd=1&ved =0CCcQFjAA&url=http%3A%2F%2Fvision.s tanford.edu%2Fdocuments%2FNieblesHWan gFei-Fei_BMVC2006.pdf [15] Muhammad usmanghani khan and atifsaeed Human Detection In VideosJournal of theoretical and applied information tehnology , 2005-2009 jatit. [16]Macropedersoli,jordiGonzalez,Boosting Histrograms Of Gradients For Human Detectiondepartment of informatica, computer vision center universitatautonoma de Barcelona,08193 bellaterra,spain. [17] Ishrishabh Human Detection In RGB Images department of information and Organized by: Department of Computer Science and Engineering , Anand Institute of Higher Technology, Chennai. www.iconic12.in E-mail: iconic.aiht@gmail.com Page 91 computer science California,Irvine. university of
[18] Q.Zhu, M. C. Yeh, K.T. Cheng, and S. Avidan. Fast Human Detection Using Cascade Histograms of Oriented Gradients, In CVPR06: Proceedings of the 2006 IEEE computer society conference on computer vision and pattern recognition, pages 14911498, 2006. DOI:10.1109/CVPR.2006.119 [19] P. Dollar, V. Rabaud, G. Cottrell, and S. Belongie, Behavior Recognition via Sparse Spatio-Temporal Features, in Second Joint IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, October 2005, p. 65. DOI:10.1109/VSPETS.2005.1570899 [20] C. Schuldt, I. Laptev, and B. Caputo, Recognizing Human Actions: A Local Svm Approach, in Proceedings of the International Conference on Pattern Recognition.IEEEComputer Society, 2004. DOI:10.1109/ICPR.2004.747 [21] Wikipedia Online. http://en.wikipedia.org/wiki.
Improving Security and Efficiency in Attribute Based Data Sharing

V. Nirmalrani1, S. Muthu Lakshmi 2, V. Senthil Kumar 3 Department of Information Technology, Sathyabama University, Chennai Department of Computer Science & Engineering, SRM University, Chennai 3 Department of Computer Science and Engineering, PRIST University, Chennai, India.
2 1
Abstract
The Key Generation Center (KGC) could decrypt any messages addressed to specific users by generating their private keys. This is not suitable for data sharing scenarios where the data owner would like to make their private data only accessible to designated users key. To overcome this problem we propose escrow problem which means a written agreement delivered to a third party and Attribute-Based Encryption (ABE). Attribute-based encryption is a promising cryptographic approach, is a fine-grained data access control which provides a way of defining access policies based on different attributes of the requester, environment and the data object. The KGC can decrypt every cipher text addressed to specific users by generating their attribute keys. This could be a potential threat to the data confidentiality or privacy in the data sharing systems.
Keywords
Attribute based Encryption, Access Control, Cipher text Policy, Revocation, Data Sharing
INTRODUCTION
Network and computing technology enables many people to easily share their data with others, using online external storages. People can share their lives with friends by uploading their private photos or messages into the online social networks; or upload highly sensitive Personal Health Records (PHRs) into online data servers such as Microsoft Health Vault, Google Health for ease of sharing with their primary doctors or for cost saving. The Security Management of PHRs is shown in Fig. 1. As people enjoy the advantages of these new technologies and services, their concerns about data security and access control also arise. Improper use of the data by the storage server or unauthorized access by outside users could be potential threats to their data. People would like to make their sensitive or private data only accessible to the authorized people with credentials they specified. Attribute-Based Encryption (ABE) is a promising cryptographic approach that achieves a fine-grained data access control. It provides a way of defining access policies based on different attributes of the
requester, environment and the data object. Especially, Cipher text-Policy Attribute-Based Encryption (CP-ABE) enables an encryptor to define the attribute set over a universe of attributes that a decryptor needs to possess in order to decrypt the cipher text and enforce it on the contents. Thus, each user with a different set of attributes is allowed to decrypt different pieces of data as per the security policy. This effectively eliminates the need to rely on the data storage server for preventing unauthorized data access, which is the traditional access control approach such as the reference monitor Nevertheless, applying CP-ABE in the data sharing system has several challenges. In CP-ABE, the Key Generation Centre (KGC) generates private keys of users by applying the KGCs master secret keys to users associated set of attributes. Thus, the major benefit of this approach is to largely reduce the need for processing and storing public key certificates under traditional Public Key Infrastructure (PKI). However, the advantage of the CP-ABE comes with a major drawback which is known as a key escrow problem. The KGC can decrypt every cipher text addressed to specific users by generating their attribute keys. This could be a potential threat to the data confidentiality or privacy in the data sharing systems.
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) Another challenge is the key revocation. Since some users may change their associate attributes at some time, or some private keys might be compromised, key revocation or update for each attribute is necessary in order to make systems secure. This issue is even more difficult especially in ABE, since each attribute is conceivably shared by multiple users (henceforth, we refer to such a set of users as an attribute group). must be stored securely on the receiving devices is small. Keeping the size of private key storage as low as possible is important as cryptographic keys will often be stored in tamper-resistant memory, which is more costly. This can be especially critical in small devices such as sensor nodes, where maintaining low device cost is particularly crucial [3]. Identity-based encryption (IBE) is an exciting alternative to public-key encryption, as IBE eliminates the need for a Public Key Infrastructure (PKI). The senders using an IBE do not need to look up the public keys and the corresponding certificates of the receivers, the identities (e.g. emails or IP addresses) of the latter are sufficient to encrypt. Any setting, PKI- or identity-based, must provide a means to revoke users from the system. The most practical solution requires the senders to also use time periods when encrypting, and all the receivers (regardless of whether their keys have been compromised or not) to update their private keys regularly by contacting the trusted authority [4]. Cipher text-Policy Attribute Based Encryption (CPABE) is a promising cryptographic primitive for finegrained access control of shared data. In CP-ABE, each user is associated with a set of attributes and data are encrypted with access structures on attributes. A user is able to decrypt a cipher text if and only if his attributes satisfy the cipher text access structure. Beside this basic property, practical applications usually have other requirements [5]. In cipher text policy attribute-based encryption (CPABE), every secret key is associated with a set of attributes, and every cipher text is associated with an access structure on attributes. Decryption is enabled if and only if the user's attribute set satisfies the cipher text access structure. This provides fine-grained access control on shared data in many practical settings, e.g., secure database and IP multicast. The communication model is one-to-one, in the sense that any message encrypted using a particular public key can be decrypted only with the corresponding secret key. The same holds for identity-based encryption (IBE), where user public keys can be arbitrary bit strings such as email addresses [6].
Fig. 1 Security Management of PHRs This implies that revocation of any attribute or any single user in an attribute group would affect all users in the group. It may result in bottleneck during rekeying procedure or security degradation due to the windows of vulnerability.
E. Related Work
Cipher text-Policy Attribute-Based Encryption (CPABE), a user secret key is associated with a set of attributes, and the cipher text is associated with an access policy over attributes. The user can decrypt the cipher text if and only if the attribute set of his secret key satisfies the access policy specified in the cipher text. In several distributed systems a user should only be able to access data if a user posses a certain set of credentials or attributes. Currently, the only method for enforcing such policies is to employ a trusted server to store the data and mediate access control [2]. In [3], they created public key revocation encryption systems with small cryptographic private and public keys. Their systems have two important features relating respectively to public and private key size. First, public keys in our two systems are short and enable a user to create a cipher text that revokes an unbounded number of users. This is in contrast to other systems where the public parameters bound the number of users in the system and must be updated to allow more users. Second, the cryptographic key material that
EXISTING SYSTEMS AND PROPOSED SOLUTION

The key escrow problem could be solved by escrowfree key issuing protocol, which is constructed using the secure two-party computation between the key
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) generation centre and the data storing centre, finegrained user revocation per each attribute could be done by proxy encryption which takes advantage of the selective attribute group key distribution on top of the ABE. The performance and security analyses indicate that the proposed scheme is efficient to securely manage the data distributed in the data sharing system
F. Existing System
As shown in Fig. 2, the architecture of data sharing system consists of the following entities:
G. Data Owner
Most of the existing ABE schemes are constructed on the architecture where a single trusted authority, or KGC has the power to generate the whole private keys of users with its master secret information Thus, the key escrow problem is inherent such that the KGC can decrypt every cipher text addressed to users in the system by generating their secret keys at any time.
It is a client who owns data, and wishes to upload it into the external data storing center for ease of sharing or for cost saving. A data owner is responsible for defining (attribute based) access policy, and enforcing it on its own data by encrypting the data under the policy before distributing it. Data Owner to get key from key generator Encrypt the file. Encryption is the conversion of data into a form, called a cipher text that cannot be easily understood by unauthorized people.
H. Data Storing Centre
The key generation center could decrypt any messages addressed to specific users by generating their private keys. This is not suitable for data sharing scenarios where the data owner would like to make their private data only accessible to designated users.
Proposed Solution
In this paper, we propose a novel CP-ABE scheme for a secure data sharing system. The key issuing protocol generates and issues user secret keys by performing a secure two-party computation (2PC) protocol between the KGC and the data storing centre with their own master secrets. The 2PC protocol deters them from obtaining any master secret information of each other such that none of them could generate the whole set of user keys alone. The data confidentiality and privacy can be cryptographically enforced against any curious KGC or data storing centre in the proposed scheme.
It is an entity that provides a data sharing service. It is in charge of controlling the accesses from outside users to the storing data and providing corresponding contents services. The data storing center is another key authority that generates personalized user key with the KGC, and issues and revokes attribute group keys to valid users per each attribute, which are used to enforce a fine-grained user access control. Data storing center store the data. Data Storage Centres provides offsite record and tape storage, retrieval, delivery and destruction services.
I. User
This is an entity who wants to access the data. If a user possesses a set of attributes satisfying the access policy of the encrypted data defined by the data owner, and is not revoked in any of the attribute groups, then he will be able to decrypt the cipher text and obtain the data.
J. Key Generation Centre
The key escrow problem could be solved by escrow-free key issuing protocol, which is constructed using the secure two-party computation between the key generation center and the data storing center. Fine-grained user revocation per each attribute could be done by proxy encryption which takes advantage of the selective attribute group key distribution on top of the ABE.
It is a key authority that generates public and secret parameters for CP-ABE. It is incharge of issuing, revoking, and updating attribute keys for users. It grants differential access rights to individual users based on their attributes. Key generation is the process of generating keys for cryptography. A key is used to encrypt and decrypt whatever data is being encrypted or decrypted.
ATTRIBUTE BASED DATA SHARING SYSTEM
FUNCTIONAL AND NON FUNCTIONAL

REQUIREMENTS
K. Functional Requirements
Fig. 2. Architecture of a data sharing system. The node structure of the Attribute based data sharing system is shown in Fig. 3. The nodes involved are admin and clients which stands as UI for the system. The nodes are Key Generation Centre (KGC) is a key authority that generates public and secret parameters for CP-ABE. Data storing center is an entity that provides a data sharing service. The data storing center is another key authority that generates personalized user key with the KGC, and issues and revokes attribute group keys to valid users per each attribute, which are used to enforce a fine-grained user access control. It is a client who owns data, and wishes to upload it into the external data storing center for ease of sharing or for cost saving. A data owner is responsible for defining (attribute based) access policy, and enforcing it on its own data by encrypting the data under the policy before distributing it. User is an entity who wants to access the data.
The key issuing protocol generates and issues user secret keys by performing a secure two-party computation (2PC) protocol between the KGC and the data storing center with their own master secrets. The 2PC protocol deters them from obtaining any master secret information of each other such that none of them could generate the whole set of user keys alone. The data confidentiality and privacy can be cryptographically enforced against any curious KGC or data storing center in the proposed scheme.
L. Non Functional Requirements
Efficiency Attribute Based Data Sharing System encrypting the content, hence solving the performance degradation problem of distributed approach.
XI. METHODOLOGY
The key issuing protocol generates and issues user secret keys by performing a secure two-party computation (2PC) protocol between the KGC and the data storing centre with their own master secrets. Most of the existing ABE schemes are constructed on the architecture where a single trusted authority, or KGC has the power to generate the whole private keys of users with its master secret information Thus, the key escrow problem is inherent such that the KGC can decrypt every cipher text addressed to users in the system by generating their secret keys at any time.
M. Cipher text Policy Attribute Based Encryption with
Fig. 3. Node Structure of a data sharing system.
User Revocation We define the CP-ABE with user revocation capability scheme. The scheme consists of the following six algorithms: Setup: The setup algorithm is a randomized algorithm that takes no input other than the implicit security parameter. It outputs the public key PK and a master key MK. AttrKeyGen: The attribute key generation algorithm takes as input the master key MK, a set of attributes and a set of user indices. It outputs a set of private attribute keys for each user in U that identifies with the attributes set.
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) KEKGen: The key encrypting key (KEK) generation algorithm takes as input a set of user indices and outputs KEKs for each user in U, which will be used to encrypt attribute group keys. Encrypt: The encryption algorithm is a randomized algorithm that takes as input the public parameter PK, a message M, and an access structure AA over the universe of attributes. It outputs a cipher text such that only a user who possesses a set of attributes that satisfies the access structure will be able to decrypt the message. ReEncrypt: The re-encryption algorithm is a randomized algorithm that takes as input the cipher text including an access structure and a set of attribute groups. If the attribute groups appear in AA, it reencrypts for the attributes; else, returns specifically, it outputs a re-encrypted cipher text such that only a user who possesses a set of attributes that satisfies the access structure and has a valid membership for each of them at the same time will be able to decrypt the message. Decrypt: The decryption algorithm takes as input the cipher text which contains an access structure AA, a private key SK, and a set of attribute group keys for a set of attributes.
XII.
IMPLEMENTATION OF ALGORITHM
N. C# and .NET Implementation
The implementation of Attribute Based Data Sharing System consists of the following components Data Owner: o Login o Key Generation Center (KGC) o Data owner (set Access Policy, Encrypt File) o Send Data Storing Center Data Storing Centre o Store Data User o Authentication (Registration /Login) o User Access o View Available Files o User Get File o Decrypt File Data Owner: Login: If the user enters a valid username/password combination they will be granted to access data. If the user enter invalid username and password that user will
be considered as unauthorized user and denied access to that user. Key Generation Centre (KGC): It is a key authority that generates public and secret parameters for CPABE. It is in charge of issuing, revoking, and updating attribute keys for users. It grants differential access rights to individual users based on their attributes. Key generation is the process of generating keys for cryptography. A key is used to encrypt and decrypt whatever data is being encrypted or decrypted. Data owner (set Access Policy, Encrypt File): It is a client who owns data, and wishes to upload it into the external data storing centre for ease of sharing or for cost saving. A data owner is responsible for defining (attribute based) access policy, and enforcing it on its own data by encrypting the data under the policy before distributing it. Data Owner to get key from key generator Encrypt the file. Encryption is the conversion of data into a form, called a cipher text that cannot be easily understood by unauthorized people. This operation is shown in Fig. 4. Send Data Storing Centre: Data storing centre store the data of data owner in the encrypted form. Data Storing Centre It is an entity that provides a data sharing service. It is in charge of controlling the accesses from outside users to the storing data and providing corresponding contents services. The data storing centre is another key authority that generates personalized user key with the KGC, and issues and revokes attribute group keys to valid users per each attribute, which are used to enforce a fine-grained user access control. Data storing centre store the data. Data Storage Centres provides offsite record and tape storage, retrieval, delivery and destruction services.
Fig. 4 Data Owner (Set Access Policy, Encrypt File)
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) User: Authentication (Registration /Login): New user access data storing means must, new User can enter our details and register here. In Login Form module presents users a form with username and Password fields. If the user enters a valid username/password combination they will be granted to access data. If the user enter invalid username and password that user will be considered as unauthorized user and denied access to that user. User Access: In this module the user to check our attributes and access policy. View Available Files: Data Storing Centre Store the number of files that files are displayed authorized user based on user access policy. User Get File: It is an entity who wants to access the data. If a user possesses a set of attributes satisfying the access policy of the encrypted data, and is not revoked in any of the valid attribute groups, then he will be able to decrypt the cipher text and obtain the data User to select particular file and get Key from Key Generation Centre. Decrypt File: Decryption is the reverse process to Encryption. Frequently, the same Cipher is used for both Encryption and Decryption. While Encryption creates a Cipher text from a Plaintext, Decryption creates a Plaintext from a Cipher text. User uses that particular file key decrypt and save that file
O. Screen Shots
Fig. 6 Data Owner Login Form Data Owner Login Screen is shown in Fig. 6. If the user enters a valid username/password combination they will be granted to access data. If the user enter invalid username and password that user will be considered as unauthorized user and denied access to that user.
Fig. 7 Key Generation Center Key generation process is shown in Fig. 7, generating keys for cryptography. It provides public key and private key. A key is used to encrypt and decrypt whatever data is being encrypted or decrypted.
Fig. 5 Data Owner
Fig. 8 Data Owner File Entry
Data Storing Centre Process is shown in Fig. 9. It is in charge of controlling the accesses from outside users to the storing data and providing corresponding contents services. Data storing centre store the data. Data Storage Centres provides offsite record and tape storage, retrieval, delivery and destruction services.
Fig. 12 shows how the Data Storing Centre Store the number of files that files is displayed authorized user based on user access policy. It is an entity who wants to access the data. If a user possesses a set of attributes satisfying the access policy of the encrypted data, and is not revoked in any of the valid attribute groups, then he will be able to decrypt the cipher text and obtain the data User to select particular file and get Key from Key Generation Centre.
Fig. 9 Data Storing Center
Fig. 12 File Download
Fig. 10 User
Fig. 13 Decrypt File Decryption of a file is shown in Fig. 13. Decryption is the reverse process to Encryption. Frequently, the same Cipher is used for both Encryption and Decryption. While Encryption creates a Cipher text from a Plaintext, Decryption creates a Plaintext from a Cipher text. User uses that particular file key decrypt and save that file.
CONCLUSION AND FUTURE ENHANCEMENT

Fig. 11 User Login
P. Conclusion
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) To achieves more secure and fine-grained data access control in the data sharing system. We demonstrated that the proposed scheme is efficient and scalable to securely manage user data in the data sharing system. Data privacy and confidentiality in the data sharing system against any system managers as well as adversarial outsiders without corresponding (enough) credentials. Ling Cheung, Calvin Newport, Provably secure cipher text policy ABE, Proceedings of the 14th ACM conference on Computer and communications security, ISBN: 978-1-59593703-2, pp 456-465, 2007.
FUTURE ENHANCEMENT
In the future, it would be interesting to consider attribute-based encryption systems by applying advanced cryptosystem for data sharing. In future, we encrypt multimedia content, Solve the performance degradation of fully distributed approach, Neglected key expired time, we can use multi Data Storing Centre, Proxy servers to update user secret key without disclosing user attribute information. Junbeom Hur and Dong Kun Noh, AttributeBased Access Control with Efficient Revocation in Data Outsourcing Systems, IEEE Transactions On Parallel And Distributed Systems, pp 1214-1221, 2011. Luan Ibraimi, Milan Petkovic, Svetla Nikova, Pieter Hartel and Willem Jonker, Mediated Ciphertext-Policy Attribute-Based Encryption and Its Application, Information Security Applications, Lecture Notes in Computer Science, DOI: 10.1007/978-3-642-10838-9_23, pp 309323,2009. Lewko, Allison; Sahai, Amit; Waters, Brent, Revocation Systems with Very Small Private Keys, Security and Privacy (SP), IEEE Symposium, May 2010, 978-1-4244-6895-9, pp 273 285, 2010. Alexandra Boldyreva, Vipul Goyal, Virendra Kumar, Identity-based encryption with efficient revocation, Proceedings of the 15th ACM conference on Computer and communications security, ISBN: 978-1-59593-810-7, pp 417-426, 2008. Shucheng Yu, Cong Wang, Kui Ren, Wenjing Lou, Attribute based data sharing with attribute revocation, Proceedings of the 5th ACM Symposium on Information, ISBN: 978-1-60558936-7, pp 261-270, 2010.
REFERENCES
AN INTEGRATED METHOD OF KANO MODEL AND QFD FOR DESIGNING IMPRESSIVE QUALITIES OF HOTEL MANAGEMENT SERVICE
SENTHIL KUMAR. M1 , Dr.R.BASKARAN2
1
PG Student , Department of Industrial Engineering, Anna University, Chennai-600025, India 2 Professor , Department of Industrial Engineering, Anna University, Chennai-600025, India
Abstract
Hotel service industry is one of the fast growing service sectors in India. Customer satisfaction and customer-oriented service are becoming important issues and playing significant roles on establishing sustainable strategies. Due to lack of a systematically effective method of identifying patients needs and voice, in this paper, we present an integrated methodology of Kano model and Quality Function Development (QFD). Firstly, Kano model is applied to identify customer needs, and calculate customer satisfaction coefficient that helps the manager to prioritize the importance of service qualities that can increase customer satisfaction. Then, by following the same procedures for product design the four phases of QFD are used to translate the voice of the customers needs into the regular service planning. With this methodology, managers can straightforwardly identify and prioritize what customers need. The steps described here are practical procedures for improving and upgrading customer service. Furthermore, the results can facilitate the managers to develop sustainable strategies.
I. INTRODUCTION Customer-oriented service is the core value of hotel management service business. Hospitality not only provide care bu talso need to pay attention to the issues of increasing customer satisfaction. In India, many hotels have implemented quality improvement initiatives such as TQM(Total Quality Management), QCC(Quality Control Circle), 5S, BPR (Business Process Reengineering) etc. All efforts are made to improve their efficiency and service quality.Their ultimate goal is to increase customer satisfaction and to maintain reasonable profits. Due to lack of a systematic method to improve customer
satisfaction, the current practice for eliminating the complaints and dissatisfaction is a passive way. Hotel managent service is a special form of service sectors. It includes highly intensive contact with their customers. It differs from the other service business. Reference 1 suggests that traditional service evaluation method- SERVQUL is not appropriate for hospitality service. They proposed three quality dimensions: structure, process, and outcome. In this study we adopted this framework to design the questionnaire. We provided an integration of Kano model and Quality Function Development (QFD) to design impressive qualities for hospitality service. Kano model is
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) applied to identify customer needs, and to calculate customer satisfaction coefficient that helps the manager to trade-off which is the most importance quality for increasing satisfaction. Then, four phases of quality function development are used to translate the voice of the customers needs into the regular service planning. With this methodology, Managers can identify and prioritize what customers need. There are three main objectives of this study: (1) to adopt Reference 1 framework to construct Kano questionnaire, and prioritize medical service quality criteria, and calculate customer satisfaction co-efficient; (2) Then to integrate with QFD methodology; (3) Finally, to follow QFD method to translate the voice of customer into the voice of regular production planning. II. METHODOLOGY The step-by-step procedures of integrating Kano and QFD for designing impressive quality or hotel management service is described as follows. A. Kano Model Many researchers have applied the useful diagram of Kanos model for identifying customers needs and how a given service feature or attribute affects customer satisfaction [2,3,4,5,6,7]. The Kano method has been shown effective in trade-off situations in the product/service development processes. Basically, the function of Kanos model is the belief that the product/service criteria which have the great influence on the customers satisfaction can be distinguished [8]. Thus, marketing people, design engineers and manufacturing, quality control staff should work hard together from the time a product is first imaged [9]. The advantages of classifying customer requirements by means of the Kano methods are quite clear. 1.Product requirements are better understood. The product criteria which have the greatest impact on customers satisfaction can be identified. Classifying the service requirements into must-be, one-dimensional and attractive dimensions can be used to focus on priorities for service development. It is, for example, not very useful to invest in improving must-be requirements which are already at a satisfaction level, but better to improve onedimensional or attractive requirements as they have a greater impact on the perceived service quality and consequently on the customers level of satisfaction. trade-off situations in the product development stage. If two service requirements cannot be met simultaneously due to technical or financial reasons, the criterion which has the greatest impact on customer satisfaction can be identified. -be, one-dimensional and attractive requirements differ in the utility expectations of different customer segments. Specific solutions can be tailored for different customer segments. Discovering and fulfilling attractive requirements creates a wide spectrum of possibilities for differentiation. importance of individual product features for customers satisfaction; hence it creates the optimal prerequisite for process-oriented product development activities. B.Six Quality Categories of Kano Model References 2 have developed a useful diagram for characterizing customers requirements. According to the Kano model, a customers attributes can be effective categorized into six categories, described below: attribute can originate great satisfaction in the customer. However, the absence of the same quality attribute does not originate dissatisfaction. not be satisfied when the current quality attribute fulfil. However, if the product or service does not
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) meet the customers need, customer will become great dissatisfied. -dimensional quality attributes: The customer satisfaction level is directly proportional to the certain quality attribute. that has no effect on customer satisfaction whether the quality attribute is present or not. be satisfied when the current quality attribute is absent. or misinterpretation of the answers on the survey or filling out the error questionnaires, the contradictions in the response given from customer may happen. By applying Kano philosophy, customer requirements cannot be solely taken into account the numerical evaluation. On the contrary, a qualitative measure from psychological aspects is needed to analyse customer needs. Both Reference 8 and 10 have proposed an integration of Kano model and QFD in meeting customer requirements. Presently the usefulness of Kanos model has barely been introduced into hotel service business. Reference 11 applied Kanos model to explore how patient satisfaction behaves and briefly analysed the characteristics and advantage of methods used to assess patient satisfaction. They found that physician provides the patients with written treatment and good communication would increase patient satisfaction. On the other hand, the absence of the same quality attribute would lead to dissatisfaction. This situation would be similar to the Kanos one-dimensional quality attributes. Therefore, it is worth to study how to apply Kano model to hotel service. In the following section, we explain how hotel service requirements are identified and classified by means of questionnaire. Then the questionnaire results are evaluated and interpreted and used as basis for hotelservice development. The methodology to assess and evaluate customers requirements is adopted from Reference 8. Step 1 Identification of hotel service requirements The first step of constructing the Kano questionnaire is to explore the hotel service requirements of the customers and their family. The commonly used interview methods such as focus group interview and individual interview do not suffice to investigate the potential requirements. The interviews are based on customers prior experiences with service quality and their feeling about satisfaction. The main focus of this research is to explore the attractive requirements that are not explicitly expressed by the customers. In another word, the hidden needs of the customers and their family can be ascertained. Hence, with implementation of the attractive requirements, it is possible to increase the level of attractive or impressive requirements. Step 2 Construction of Kano questionnaires Attractive, one-dimensional, must-be, indifference, and reverse quality requirements can be classified by Kano model. For each quality criterion, a pair of questions of functional and dysfunctional form is formulated to which the interviewee can answer in one of five different ways [2]. One example of a pair of questions is illustrated in Figure 1.By combining the two answers in the Kano evaluation table (see Table 1), the quantity criteria can be classified into six categories: Attractive (A), One-dimensional (O), Must-be (M), Indifference (I), and Reverse (R), and Questionable (Q). Categories A, O, M and I are very intuitive. Category R feature is not only wanted by customers, but even expects the
c. Assess and Evaluate Customer Requirements
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) reduction of development time by one half to one-third[15]Quality function deployment (QFD) is one of the very useful quality systems tools commonly applied to fulfilcustomer needs and improve customer satisfaction inmany industries [13, 9,15, 16, 7, 17, 18]. The traditional QFD is composed of four phases, depicted in Figure 1:product planning (also known as house of quality (HOQ)),parts deployment, process planning, and production planning. In the HOQ, as shown in Figure 2, six majorsteps are required to complete this matrix: (1) customerneeds (WHATs), (2) planning matrix, (3) technical measures (HOWs), (4) relationship matrix between WHATs and HOWs, (5) technical correlation matrix, and (6) technical matrix [17]. (1)
reverse. Category Q stands for a questionable result. The reason for questionable score might be due to misunderstanding the questions or crossing the wrong answers. Step 3. Calculation of customer satisfaction coefficient The customer satisfaction coefficient indicates whether satisfaction can be increased by meeting requirement, or whether fulfilling this product requirement merely prevents the customer from being dissatisfaction [12]. Two formula of calculating the extent of satisfaction and dissatisfaction are stated below. Extent of satisfaction: A+O A+O+M+I Extent of dissatisfaction: M+O A+O+M+I(-1) Step 4. Integration with QFD Quality Function Deployment was developed by Yoji Akao in Japan in 1966. By 1972 the power of the approach had been well Demonstrated at the Mitsubishi Heavy Industries Kobe Shipyard [13] and in 1978 the first book on the subject was published in Japanese and then later translated into English [14] In Akaos words, QFD "is a method for developing a design quality aimed at satisfying the consumer and then translating the consumer's demand into design targets and major quality assurance points to be used throughout the production phase [QFD] is a way to assure the design quality While the product is still in the design stage." As a veryImportant side benefit he points out that, when appropriately applied, QFD has demonstrated the
III. RESULTS
By (2) following the steps described in Section II, webriefly present a simplified version of the results of ourstudy to illustrate the feasibility and effectiveness of the integrated methodology. Step 1: Identification of hotel service requirements. Following the framework of reference 1, we have three quality dimensions: structure, process, and outcome. Here some examples are described as follows: 1. Cleanliness and hygiene. 2. Hotel ambiance. 3. In room comforts. 4. Online facilities. 5. Room service. 6. House keeping service. 7. Staff with good communication skills. 8. Accuracy in billing. 9. Availability of Staff to provide service. 10. Friendliness of staff. 11. Visually appealing facilities. 12. Neat appearance of employees. 13. Attractive lobby. 14. Quality of food.
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) 15. Complimentary items. Step 2 Construction of Kano questionnaires for medical service For example, we set CharacteristicX: in room comforts. can facilitate the manager to distinguish quality needs more efforts and resource to improve their service. Step 4 Integration with QFD Finally, the QFD for designing impressive hotel management service .
IV. CONCLUSION
This study provides an integration of Kano model and a traditional survey to satisfying customer requirements. Kano model is applied to identify customer needs, and calculate customer satisfaction coefficient that helps the manager to trade-off which is the most importance quality for increasing satisfaction. Then, four phases of quality function development are used to translate the voice of the customers needs into the regular service planning. With this methodology, Managers can identify and prioritize what customers need. The steps described here are practical procedures for improving and upgrading hotel management service quality and in turn influencing customer satisfaction. Furthermore, the integrated approach applied in the research could be used for hotels which are trying to build their strategies for improving service quality. In this paper we have systematically shown how Kanos model of customer satisfaction can be integrated into quality function deployment. Such methodology can help the manager establish sustainable competitive.
REFERENCES
[1] G.M. Zifko-Baliga, and R.F. Krampf, Managing perceptions of hospital quality, Marketing Health Service, Vol. 17 No.2, pp. 28-35, 2001. [2] N. Kano, N. Seraku, F. Takahashi and S. Tsuji, Attractive quality and must-be quality, Hinshitsu (Quality, theJournal of Japanese Society for Quality Control), Vol. 14, pp. 39-48, 2003.
Step 3 Calculation of customer satisfaction coefficient Using Equations (1) and (2) to calculate customer satisfaction coefficients, the results are shown I Table 2 According to the data, accuracy in billing with a positive Customer satisfaction coefficient of 0.26 can only slightly increase satisfaction.In another hand, negative customer satisfaction coefficient of -0.867 leads to more than proportional dissatisfaction.Thus the results
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) [3] N. Kano, Quality in Year 2000: Downsizing throughReengineering and Upsizing through Attractive QualityCreation, ASQC Annual Quality Congress, Las Vegas, 1996 [4] N. Kano, Upsizing the Organization by Attractive Quality Creation, Total Quality Management Proceedings of theFirst World Conference, pp. 60-72, 1998. [5] N. Kano, Life Cycle of Quality and Attractive Quality Creation, Proceedings of Quality Excellence in Newmillennium - The 14th Asia Quality Symposium 2000 Taipei, pp. 7-11, 2000. [6] C. Berger, et al., Kanos method for understandingcustomer-defined quality, Center for Quality ManagementJournal, Vol.2, pp. 3-35, 1999. [7] G.Cohen, Age and health status in a patient satisfaction survey, Social Science Medicine, Vol. 42, pp. 1085-10931996. [8] K. Matzler, and H. H. Hinterhuber, How to Make Product Development Projects More Successful by Integrating Kanos Model of Customer Satisfaction into Quality Function Deployment, Technovation, Vol.18, No.1, pp. 25-38. 1998. [9] J. R. Hauser, and D. Clausing,. "The House of Quality," The Harvard Business Review, May-June, No. 3, pp. 63-73, 1988. [10] X. X. Shen, K. C. Tan, and M. Xie, An IntegratedApproach to Innovative Product Development Using Kano Model and QFD, European Journal of Innovation Management, Vol.3 No.2, pp.91-99, 2000. [11] A. C. Jane, and S. M. Dominguez, Citizens role in health services: satisfaction behavior: Kanos model, part 2,Quality Management in Health Care, 12, 72-80 (2003). [12] Lee, M.C. and J. F. Newcomb, Applying the Kano methodology to meet Organized by: Department of Computer Science and Engineering , Anand Institute of Higher Technology, Chennai. www.iconic12.in E-mail: iconic.aiht@gmail.com Page 105 customer requirements: NASAs microgravity science program, Quality Management Journal, no. 4,pp. 95-106, 1997. [13] L.P. Sullivan, "Quality Function Deployment", Quality Progress, June, pp. 39-50, 1986. [14] Mizuno, S. and Y. Akao, ed. QFD: The Customer-Driven Approach to Quality Planning and Development, Asian Productivity Organization, Tokyo, Japan, available from Quality Resources, One Water Street, White Plains NY, 1994. [15] Y. Akao, ed. Quality Function Deployment, Productivity Press, Cambridge MA. Becker Associates Inc, 1990. [16] D. Clausing, Total Quality Development: A Step-by-StepGuide to World Class Concurrent Engineering, ASME Press, New York, 2006. [17] L. K. Chan, and M. L. Wu, Prioritizing the Technical Measures in Quality Function Deployment, Quality Engineering, Vol.10, no.3, pp. 467-479, 1998. [18] T. C. Kuo, and H. H. Wu, Green Products Development by Applying Grey Relational Analysis and Green QualityFunction Deployment, International Journal of Fuzzy Systems, Vol. 5, No. 4, pp. 229-238, 2003.
RESULTS Requirement Cleanliness and hygiene. Hotel ambiance. In room comforts. Online facilities. Room service. Housekeeping service. Staff with good communication skills. Accuracy in billing. Availability of Staff to provide service. Friendliness of staff. Visually appealing facilities. Neat appearance of employees. Attractive lobby. Quality of food. Complimentary items. 6.67% 46.67% 50% A 33% O 60% M 6.67% I TOTAL Category 100% 13.34% 100% 46.67% 100% 100% 100% A A M M A I M A M I O O I M
36.67% 43.3% 6.67% 43.34% 10%
43,34% 46.67% 10% 40% 46.67% 3.34% 10% 6.67%
43.34% 100% 43.34% 100% 100% 13.34% 100% 100% 100%
13.34% 13.34% 73.3% 13.34% 13.34% 60% 33.3% 20% 26.67%
16.67% 30% 13.34% 60%
13.34% 33.34% 40% 50% 6.67% 20% 6.67%
13.34% 100% 43.34% 100% 100%
66.67% 6.67%
33.34% 3.34% 6.67%
56.57% 100%
REQUIREMENTS Cleanliness and hygiene. Hotel ambiance In room comforts Online facilities. Room service.
EXTENT OF SATISFACTION 0.933 0.8 0.433 0.433 0.4667
EXTENT DISSATISFACTION -.0.0667 -0.5 -0.1 -0.9 -0.867
OF
Housekeeping service.
0.4667
-0.1 -0.067 -0.867 -0.734 -0.364 -0.134 -0.734 -0.1 -0.867 -0.1
Staff with good communication 0.5 skills. Accuracy in billing. Availability of Staff Friendliness of staff. Visually appealing facilities Neat appearance of employees. Attractive lobby. Quality of food. Complimentary items. 0.26 0.26 0.53 0.26 0.467 0.467 0.36 0.367
A RBF based learning approach to predict the status of SCM repository

Divya Bharathi.S 1 ,Subha.S 2 , Ashok.M 3 , Umadevi.A 4 1,2 Lecturer ,Department of IT, Rajalakshmi Institute of Technology,Chennai-124 3 Senior Lecturer , Department of IT ,Rajalakshmi Institute of Technology,Chennai-124 4 Assistant Professor-MBA VeltechUniversity,Chennai-62
Abstract
Change management in software engineering is a tough task to handle in the development environment. Software Configuration management plays a vital role in managing the effects of its repository. The challenge in managing SCM repository is predicting the effects when changes occur in customer requirements. SCM repositories are poor in facing the effects of change management. Here we are introducing a computational intelligence technique to improve the management skill of SCM repository Keywords - Change, SCM, Repository, Prediction, Skill. I.Introduction Software Configuration Management
All SCM [5] systems provide the following essential features:

repository, we want people to be able to do this, but it can lead to some problems. Consider a simple example, if we allow engineers to modify the same file simultaneously in a central repository of source code. Client1 and Client2 both need to make changes to a file at the same time: 1. 2. 3. 4. Client1 opens bar.cpp. Client2 opens bar.cpp. Client1 changes the file and saves it. Client2 changes the file and saves it overwriting Client1's changes.
Concurrency Management Versioning Synchronization
SCM systems works on a simple idea, the definitive copies of your files are kept in a central repository. People check out copies of files from the repository, work on those copies, and then check them back in when they are finished. SCM systems manage and track revisions by multiple people against a single master set.
Concurrency Management
Concurrency refers to the simultaneous editing of a file by more than one person. With a large
Obviously, we don't want this to happen. Even if we controlled the situation by having the two engineers work on separate copies instead of directly on a master set as in the illustration below, the copies must somehow be reconciled. Most SCM systems deal with this problem by allowing multiple engineers to check a file out ("sync" or "update") and make changes as needed. The SCM system then runs algorithms to
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) merge the changes as files are checked back in ("submit" or "commit") to the repository. people have made. This process is called syncing or updating.
Subversion Subversion (SVN) is an open-source version control system. It has all of the features described above. SVN adopts a simple methodology when conflicts occur. A conflict is when two or more engineers make different changes to the same area of the code base and then both submit their changes. SVN only alerts the engineers that there is a conflict - it's up to the engineers to resolve it. The first step is to install SVN on your system. IEEE (IEEE/ANSI Standard 10421987) gives a standard definition for configuration management, which includes: Identification, Control, Status Accounting, Audit and review. BERSOFF,(1984) described the Configuration management elements as following: Identification: Identifying each unique definition of system baseline components. Control: Controlling stages of the system life cycle. Auditing: Auditing provides the mechanism for determining the degree, to which the current state of the software system mirrors the software system pictured in the baseline and requirements documentation. Status accounting: Status accounting is the administrative tracking and reporting of all software items formally identified and controlled.
Figure 1: SCM System
Versioning
Versioning refers to keeping track of file revisions which makes it possible to recreate or roll back to a previous version of the file. This is done either by making an archive copy of every file when it is checked into the repository, or by saving every change made to a file. At any time, we can use the archives or change information to create a previous version. Versioning systems can also create log reports of who checked in changes, when they were checked in, and what the changes were.
Synchronization
With some SCM systems, individual files are checked in and out of the repository. More powerful systems allow you to check out more than one file at a time. Engineers check out their own, complete, copy of the repository and work on files as needed. They then commit their changes back to the master repository periodically, and update their own personal copies to stay up-to-date with changes other
Construction Repository Building Snapshots Optimization Cnange impact analysis Regeneration Auditing Process Versions Lifecycle Support Task Management Communication Documentation Accounting Configurations Versions of Configurations Baselines Project Contexts Structure
System Model Interfaces Relationships Selection Consistency Team
Components Merging
Fanulies Components
Workspaces
History Logging Traceability
Check Out: To request a working copy from the repository. A working copy equals the state of the project when it was checked out. Commit: To send changes from your working copy into the central repository. Also known as check-in or submit. Update: To bring others' changes from the repository into your working copy, or to indicate if your working copy has any uncommitted changes. This is the same as a sync, as described above. So, update/sync brings your working copy up-to-date with the repository copy. Conflict: The situation when two engineers try to commit changes to the same area of a file. SVN indicates conflicts, but the engineers must resolve them. Log message: A comment you attach to a revision when you commit it, which describes your changes. The log provides a summary of what's been going on in a project.
II.Literature survey
Tosum[1] quoted the effects of change management in tools formulation section. Evaluating Software Configuration Management Tools for Opticon Sensors by B.V.Fatma Tosun,. Configuration Management Improvement Stream-overview generator- by Jie Ma [2]. Processing Requirements by Software Configuration Management IEEE, Computer society by Ivica Crnkovic, Peter Funk and Magnus Larsson[4].The Implementation of Software Configuration Management in MSC Organizations - Siti Mastura[3]
Statistics Status Reports
Figure: 2 Structure of SCM Some SVN Terminology
Revision: A change in a file or set of files. A revision is one "snapshot" in a constantly changing project. Repository: The master copy where SVN stores a project's full revision history. Each project has one repository. Working Copy: The copy in which an engineer make changes to a project. There can be many working copies of a given project each owned by an individual engineer.
III.Proposed System
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) Figure 3: CI Model - SCM Repository Step 11: Generate the change management factor for SCM repository. B. Experimental Analysis We have designed our own SCM simulation in which back propagation algorithm is implemented in C-language. Five different application projects are considered for experimental purpose. They are Library Management System (LMS), Tourism Management System (TMS), Clinic Management System, and Railway Reservation System and Office Automation System. C . Results
A.Algorithm Step1: Read the factor for Design Change Step2: Read the factor for Function change Step3: Repeat steps (1) and (2) from different Projects to create statistics. Step4: Initialize weight Matrix and learning parameter. Step5: Form a statistical profile from step 3. Step6: Calculate DCF = No of times design changed No of changed FCF = No of times function changed No of times requirements changed For n projects times requirements
Figure:4 Design Change
Figure:5 Function change
Step 7: Calculate f (DCF, FCF) where f is a sigmoid function. Step 8: Multiply Step 4 with Step 7 Step 9: f(x1) =exp(x1), f(x2) =exp(x2) Step 10: Iterate Steps 4 to 9 for 100 times Organized by: Department of Computer Science and Engineering , Anand Institute of Higher Technology, Chennai. www.iconic12.in E-mail: iconic.aiht@gmail.com Page 112
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) Figure:6 Learning rate Fig4, Fig 5, Fig 6 revealed the results of intelligence. On analyzing the change management, streamlining the task is tough but predicting the effects of changes in SDLC is effective with the proposed strategy. Improvement- Stream-overview generator- Jie Ma, October 2010. 16514769, Report no 2010:077 [3] The Implementation of Software Configuration Management in MSC Organizations - Siti Mastura Bt. Sheikh Abu Bakar,2011 [4] Processing Requirements by Software Configuration Management IEEE, Computer society . Authors: Ivica Crnkovic, Peter Funk, and Magnus Larsson,2010 [5] Pressman,Software Engineering,5th edition
ISSN:
IV.Conclusion and Future Work

The Change management and change control are the two major research issues in Software configuration management. The stages in SDLC play a vital role in reflecting the effects of changes at SCM repository. Changes in requirements phase need to be predicted to avoid hassles in software configuration management will be our future work.
References
[1] Evaluating Software Configuration Management Tools for Opticon Sensors Europe B.V.Fatma Tosun, University of Amsterdam, 2000 [2] Configuration Management
Joint Flow Routing and Relay Node Assignment in Cooperative Multi-Hop Networks
Kaleeswari M 1,Selvakumari S 2 PG Scholar,Francis Xavier Engineering College, Tirunelveli, 2 PG Scholar,Francis Xavier Engineering College, Tirunelveli,
1
Abstract
It has been shown that cooperative communications (CC) has the potential to significantly increase the capacity of wireless networks. However, most of the existing results are limited to single-hop wireless networks. To explore the behavior of CC in multi-hop wireless networks, we study a joint optimization problem of relay node assignment and flow routing for a group of sessions. We develop a mathematical model and propose a solution procedure based on the branch-and-bound framework augmented with cutting planes (BBCP). We design several novel components to speed-up the computational time of BB-CP. Via numerical results, we show the potential rate gain that can be achieved by incorporating CC in multi-hop networks. Index TermsCooperative communications, flow routing, relay assignment, multi-hop, wireless network.
nodes (either for the purpose of CC or as a multi-hop relay) to each user session, and (2) the coupling problem of multi-hop flow routing and relay node assignment. To solve the problem, we develop a mathematical characterization for cooperative relay node assignment and multi- hop flow routing. For the nonlinear constraints in the problem formulation, we show how to convert them into linear constraints by exploiting some problem-specific properties. . We propose a solution procedure based on a branch-and-bound framework augmented with cutting planes (BB-CP). Our proposed solution includes three novel components that make it highly efficient. First, we develop an efficient polynomial-time local search algorithm to generate feasible flow routes that exploit CC along individual hops. Second, by exploiting our problem structure, we devise a
I. INTRODUCTION
COOPERATIVE communications (CC) is a novel physical layer mechanism where each node is equipped with only a single antenna and spatial diversity is achieved by exploiting the antennas on other nodes in the network. Although there has been active research on CC at the physical layer or for single- hop communications, results on CC in multi-hop wireless networks remain very limited. In this paper, we explore CC in multi-hop wireless networks by investigating a joint problem of relay node assignment and multi-hop flow routing. The objective of this problem is to maximize the minimum rate among a group of sessions, where each session may need to traverse multiple hops from its source to destination. The key problem we will address includes (1) the assignment of relay
clever strategy for generating cutting planes that significantly decreases the number of branches in our branch-and-bound tree. Third, we present an innovative approach to perform branching operations that exploits problem-specific properties to choose superior branches and reduce the overall computational time. Our solution procedure provides (1 )-optimal solutions, with being the desired approximation error bound. The remainder of this paper is organized as follows. Section II presents related work. Section III describes our reference model for CC. In Section IV, we develop a mathematical model and problem formulation for joint cooperative relay node assignment and multi-hop routing. In Section V, we present our solution to the optimization problem. Section VI presents numerical results and Section VII concludes this paper.
architectures for multi-hop cooperative wireless networks. Under these architectures, nodes in the network can form multiple cooperative clusters. They showed that the network connectivity could be improved by using such cooperative clusters. the authors proposed heuristics that decoupled the routing problem from relay node SHARMA et al.: JOINT FLOW ROUTING AND RELAY NODE ASSIGNMENT IN COOPERATIVE MULTI-HOP NETWORKS R S d Fig. 1. A three-node reference model for CC. Assignment (so that only one problem was addressed at a time). In contrast, we consider joint flow routing and relay node assignment in this paper, which is necessary to explore optimality, but also is much harder to solve.
II. RELATEDWORK
Research of CC at the physical layer has been very active in recent years . These findings at the physical layer have found their applications in ad hoc networks, either for single-hop networks or for multi-hop networks . In single-hop networks, the focus has been mainly on relay node assignment. For multi-hop networks, Khandani et al. studied a minimum energy routing problem (for a single message) by exploiting both wireless broadcast advantage and CC (called wireless cooperative advantage in the paper). They developed a dynamic programming based solution along with two heuristic algorithms. In, Yeh and Berry aimed to generalize the well known maximum differential backlog policy in the context of CC. They formulated a challenging nonlinear program that characterized the network stability region, but only provided solutions for a few simple cases. In [15], Scaglione et al. proposed two
III. REFERENCE MODELS

The essence of CC is to exploit (1) the wireless broadcast advantage and (2) the relaying capability of neighboring nodes so as to achieve higher data rate, lower transmission error, or other objectives in transmission. Figure 1 shows a three-node reference model for CC, where node s is a source node, node d is a destination node, and node r is a relay node. In this paper, we employ orthogonal channels to resolve contention in multi-hop wireless network. Under this model, each node uses separate channels for transmission and reception and thus can transmit and receive data on different channels at the same time without selfinterfering. Such an operation can be achieved by using a single antenna that has enough antenna bandwidth to accommodate separate channels for transmission and
reception. In the following, we present the achievable rate between s and d under CC. We consider both the amplify-and-forward (AF) and decode-and-forward (DF) coding schemes, as well as direct transmission. CC with Amplify-and-Forward (AF) Under this mode,relay node r receives, amplifies, and forwards thsignal from source node s (all in analog form) to destination node d [9], and the destination node d combines the different signals received from s and r. Let h sd , h sr , h rd capture the effects of path-loss, shadowing, and fading within the respective channels between nodes s and d, s and r, and r and d, respectively. Also, denote by z D and z r the zero-mean background noise at nodes d and r, with variance 2d and 2r , respectively. For simplicity, we assume the background noise at a node has the same stochastic property on different channels. Denote by P s and P r the transmission powers atnodes s and r, respectively. Following the same approach as that for deriving the rate under AF mode in [9], it can be shown that the achievable rate between s and d (with r as a relay) is C AF(s, r, d) =W I AF(s, r, d) , where I AF (s, r, d) = log2(1 + SNR sd + SNR sr SNR rd SN sr +SNRrd+ 1 ), SNR sd =P s 2 d |hsd|2, SNRsr=Ps2r|hsr|2, SNRrd=Pr2d|hrd|2,and W is the channel bandwidth. CC with Decode-and-Forward (DF) Under this mode, relay node r first decodes and estimates the received signal from source node s, and then transmits the estimated data to destination node d [; the destination node d combinesthe different signals received from s and r. The achievable rate under DF mode can be developed by following the same approach as that in [9], which is CDF(s, r, d) =W IDF(s, r, d),where
IDF(s, r, d) =min{log2(1 + SNRsr),log2(1 +SNRsd+SNRrd)}. Direct Transmission (without CC) When CC is not used,the achievable rate from source node s to destination node dis simply CD(s, d) =Wlog2(1 + SNRsd) .A couple of comments are in order. First, note that I AF()and IDF() are increasing functions of Psand Pr , respectively.This suggests that, in order to achieve the maximum rate undereither AF or DF, both the source node and the relay nodeshould transmit at their maximum power P. Thus, we set Ps=Pr=P. Second, based on the rate expressions, one can see that although AF and DF are different physical layermechanisms, the achievable rates for both of them have thesame mathematical form, i.e., both of them are functions of SNR sd, SNRsr, and SNRrd. Therefore, any solution procedure designed for AF can be readily extended for DF. As a result,it is sufficient to focus on developing a solution procedure forone of them, for which we choose AF in this paper. IV. CCINMULTI-HOPNETWORKS A. Network Setting We consider a group of sessions in a multihop wireless network. The data flow for each session may traverse multiple hops from its source to destination. As discussed in Section III,we employ orthogonal channels in the network, which allow different nodes to transmit simultaneously without interfering each other. We distinguish relay nodes in the network into two types, based on their functionalities. We call a relay node used for CC purpose (i.e., node r in Fig. 1) as a Cooperative Relay (CR)and a relay node used for multi-hop relaying in the traditional sense as a Multihop Relay (MR). Note that a CR operates at the physical layer while an MR operates at the network layer. Physical limitations of a wireless node may prohibit it from
transmitting (or receiving) different data on multiple channels at the same time. As a result, we assume that a relay node may serve either as a CR or an MR, but not both at the same time.This also limits an MR to receive data from only one node, and to transmit data to one other node at any given time. Similarly, a CR node can serve at most one one transmitter and receiver pair. For the same reason, a source node (or destination node)cannot serve as a CR. In, Zhao et al. showed that for a single hop, the diversity gain obtained by exploiting multiple relay nodes is only marginally higher than the diversity gain that can be obtained by selecting the best relay. As a result, we only consider at most one relay node for CC between each senderand receiver in this paper.
IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, B. Mathematical Modeling In this section, we present a mathematical model for ourjoint flow routing and relay node assignment problem. Denote N as the set of nodes in the network, with |N| = N. In set N, there are three subsets of nodes, namely, (i) the setof source nodes, (ii) Ns= {s1,s2, ,sNs}, with Ns= |Ns|, the set of destination nodes, (iii) Nd= {d1,d2, ,dNd},with (iv) Nd= |Nd| =Ns, and (v) the set of remaining nodes that are available for serving as CR or MR nodes, Nr= {r1,r2, ,rNr}, with Nr= |Nr|. For clarity, weassume that all the source and destination nodes are distinct. Then we have N=s+Nd+Nr= 2Ns+Nr . Role of Relay Nodes. Due to the existence of CRs,it is necessary to introduce integer variables to characterize
whether or not an available relay node will be used as CR. Abinary variable Awuvis defined for this purpose. Specifically, Awuv={1 if node w is used as a CR on hop (u, v), 0 otherwise.We also introduce another binary variable Buv to specifywhether or not the link from u to v is active in the routing solution. That is, Buv={1 if v is the next hop node of node u, 0 otherwise.Each MR w Nr can receive data from only one previoushop, i.e., t=w tNB tw 1, and it can transmit data to only one next hop, i.e., t=wtNBwt 1. Furthermore, as a CR w Nr can serve only one hop,u=w,u=vuN v=wvNAwuv 1. Also, we know that a relay can serve as only a CR or an MR, whichcan be enforced as follows: u=w,u=vuN v=wvNAwuv+t=wtNBtw 1 (w Nr),(1) u=w,u=vuN v=wvNAwuv+t=wtNBwt 1 (w Nr).(2) In both (1) and (2), if the first term is 1 (i.e., node w is usedas a CR), then the second term must be 0 (i.e., w cannot beused as an MR). Similarly, if the second term is 1 (i.e., node w is used as an MR), then the first term must be 0 (i.e., w cannot be used as a CR). For a relay node w Nr that is being used as an MR, since w is not the destination node of any communication session, the traffic entering node w must also exit. This can be written as follows: u=wuNBuw=v=wvNBwv(w Nr) (3) Note that (3) also holds when w is not an MR. In this case,all B variables in (3) are zero.It can be shown that once we have (3), it is sufficient to include either (1) or (2), but not both. As a result, we only include (1) in the problem formulation. Furthermore, we may assign a relay node as a CR to hop( u, v) only if it is active (i.e., if Buv= 1). Otherwise, no relay node should be assigned as a CR to
hop (u, v). This constraint can be characterized as follows: Buvw=u,w=vwNrAwuv 0 (u N,v N,v=u). (4) From the above constraint, we can see that when the valueof some Bu vis 1, w=u,w=vwNrAwuv can be 1 or 0. This means that hop (u, v) is free to use either direct transmission (w=u,w=vwNrAwuv= 0)or CC (w=u,w=vwNrAwuv= 1). Flow Routing. As explained earlier, due to transceiver limitations, a node can only transmit on one channel at any given time. As a result, we limit the transmission and reception of data at the network layer to only one transmitter and one receiver. This can be mathematically characterized by the following constraints: v=sivNBsiv= 1 (si Ns),(5) v=uvNBuv 1 (u / Ns),(6) u=vuNBuv 1 (v / Nd),(7) v=divNBvdi= 1 (di Nd),(8) where (5) says that a source node must transmit data to some other node and (8) says that a destination node must receivedata from some node.We note that there are some redundant constraints in (6)and (7). Constraint (6) can be partitioned into the following two sets of constraints: v=wvNBwv 1(w Nr),(9) v=di,sivNBdiv 1 (di Nd)(10) Similarly, constraint (7) can be partitioned into the followingtwo sets of constraints: u=wuNBu 1(w Nr),(11) u=si,diuNBusi 1 (si Ns).(12) Note that due to (3), (9) is equivalent to (11). Thus, instead ofusing (6), we will use (10) in the final problem formulation.Denote fuv(si) as the flow rate on link (u, v) that is attributed to session (si,di). The flow balance at an intermediate node w along the path between siand dican be formulatedas follows: u=w,u=di uNfuw(si) =v=w,v=si vNfwv(si)(si Ns,w N,w=di,w=si). (13) Using (13), it is easy to show that
w=siwNfsiw(si) = w=diwNfwdi(si), which states that all data generated by asource node s I must be sent to its destination node di .Rate Constraints. To ensure the feasibility of the routing solution, we must consider the capacity constraint on each hop in the network. That is, the aggregate flow rates traversing link( u, v) must not exceed the capacity on this link, i.e.,si=v siNsfuv(si) (1 w=u,w=v wNrAwuv)CD(u, v)Buv+w=u,w=v wNrAwuvCAF(u, w, v)Buv(u N,v N,v=u).(14) Note that on the right-hand-side (RHS) of (14), there can be at most one non-zero term, depending on whether direct transmission or CC is employed. If direct transmission is employed, then the first term on the RHS of (14) is non-zero and the second term is 0; the converse is true when CC is employed.
SHARMA et al.: JOINT FLOW ROUTING AND RELAY NODE ASSIGNMENT IN COOPERATIVE MULTI-HOP NETWORKS C. Problem Formulation We consider a set of Ns sessions in the network, denoted by Ns. The goal is to maximize the minimum flow rate among all active sessions via an optimal multi-hop flow routing and cooperative relay assignment. More formally, for a given session ( si,di), denote the end-to-end flow rate (orthroughput) as Rsi, where Rs=v=sivNfsiv(si). DenoteR min as the minimum flow rate among all sessions, i.e., Rminv=sivNfsiv(si) (si Ns).(15) Then our objective is to maximize Rmin .As part of our reformulation effort, we would like to convert the nonlinear constraint (14) into a linear constraint. The constraint in (14) contains the product of two variables Awuvand Buv and is thus in nonlinear form.
We can reformulateit into a linear constraint by exploiting the following property for A wuvand Buv Property 1. For any u N,v N,v=u, w Nr,w=u,w=v,Buv Awuv=Awuv .Proof: This property is proved by considering both casesof Buv.(i) When Buv= 1, the equality holds trivially.(ii) When Buv= 0, link (u, v) is not active. As a result,no CR should be assigned to (u, v), i.e., Awuv= 0 for w Nr,w=u, w=v by (4). Hence, the equality again holds.By using Property 1, we can rewrite (14) as follows: si=vsiNsfuv(si) (Buvw=u,w=v wNAwuv)CD(u, v)+w=u,w=v wNAwuvCAF(u, w, v)(u N,v N,v=u),(16) which is now a linear constraint. All of our constraints are now linear. We have the following problem formulation: MaxRmins.t.(1),(3),(4),(5),(,(,(1,(13)(15)(16) Rmin,fsi) 0(si Ns,u N,v N,u=v,di,v=si)Awuv,Buv {0,1}(w Nr,u N,v N,u=v=w)where Rmin, fuv(si), Awuv, and Buvare optimization variables.It is not hard to see that this formulation is in the form ofa mixed-integer linear program (MILP), which is NP-hard in general [3], [21]. V. PROPOSEDSOLUTIONPROCEDURE For the MILP problem formulation, we propose a solution procedure based on the socalled branch-and-bound framework augmented with cutting planes [11] (BB-CP). BB-CP is an enhancement of BB by using a CP method to efficiently handle integer variables [1], [11]. Under this framework, we propose several novel problem-specific components. We show that our solution procedure yields a (1 )optimal solutionto the MILP problem, where the value of 0 1 reflectsthe desired accuracy.In Section V-A, we give a brief overview of the BB-CP framework [11]. Then in Sections V-B to V-D, we describe several novel components in the solution procedure. A. Overview of the Algorithm
The BB-CP solution procedure consists of a set of iterative steps. During the first iterative step, an upper bound on the objective value is obtained by solving a relaxed version of the MILP problem. This relaxed problem is in the form of an LP and thus can be solved in polynomial time. However,due to relaxation, the values ofAwuvand Buvin the solution may become fractional, and the relaxed solution will thusnot be feasible to the original MILP problem. Therefore,a local search algorithm, which we call Feasible Solution Construction (FSC), is proposed to obtain a feasible solutionfrom the relaxed solution. The feasible solution obtained from FSC provides a lower bound on the objective value. If the gap between the upper and lower bounds is greater than (the desired gap), cutting planes are added to the problem. A cutting plane is a linear constraint that reduces the feasible region of the relaxed problem (but not the original MILP),thereby improving the values of the upper and lower bounds.After adding each new cutting plane, the relaxed LP is solved again. This relaxed solution will yield an improved upperbound (possibly with fractional Awuv- and Buv -values). Eachnew upper bounding solution can then be used to find a new feasible (possibly improving) lower bounding solution via our local search FSC algorithm. The process of adding cutting planes to the relaxed problem continues until the improvement in upper and lower bounds becomes marginal, i.e., within a certain percentage threshold. After cutting planes can no longer improve the bounds,the problem is partitioned into two subproblems. The relaxed versions of the two subproblems are then solved and FSCis used to obtain the upper and lower bounds for each subproblem. This step finishes the iteration. After each iteration, if the gap between the largest upperbound (among all the subproblems), and the largest lower bound (among all the subproblems) is more than , another iterative step (similar to the first step) is performed on
the subproblem having the largest upper bound. Note that after every iteration, the chosen subproblem is partitioned into two subproblems, increasing the total number of subproblems that we have. For some subproblem, if the upper and lower bounds coincide, then this subproblem is completely solved, and this subproblem is not chosen for branching in future iterations. Also, since our goal is to obtain a (1 )-optimal solution, if (1 ) times the upper bound of a certain subproblem is less than or equal to the largest lower bound among all the subproblems, then this subproblem can be removed from the problem list, as it will not affect the (1 )optimality of the final solution. This can be explained by considering thefollowing two cases: IEEE JOURNAL ON SELECTED AREAS IN Case 1: The global optimal solution is not in the Sub problem that was removed: In this case, the removal of the sub problem will not cause the removal of the optimal solution Case 2: The global optimal solution is in the sub problem that was removed: In this case, the current lower bound is already (1 )optimal. Thus, removal of the sub problem will not prevent us from finding the (1 )optimal solution (as we already have one at hand, i.e., the lower bounding solution). The iterations of BB-CP continue until the largest upper bound (among all the current sub problems) and the largest lower bound among all the sub problems (i.e., the best feasible solution value) are within of each other. At this point, the best feasible solution is (1 )-optimal. As one can see, the key challenge in implementing a BB-CP framework is in the details, i.e., how each component is designed. We propose the following novel components:1) An efficient polynomial-time local search algorithm,
called Feasible Solution Construction (FSC) algorithm. FSC generates feasible flow routes that exploit CC along individual hops. 2) By exploiting the problem structure, we establish a judicious strategy to generate cutting planes that significantly decrease the number of branches in our BB tree. 3) An effective approach to perform branching operations, which exploits problem-specific properties to select superior branches and hence reduce the overall computational time. Although the worst case complexity of our solution remains exponential (due to MILP), the actual run-time is in fact reasonable. This reasonable run-time is mainly attributed to our proposed new components in the branchand-bound framework. B. FSC Algorithm After solving the relaxed MILP, the solution may have fractional values for some of the interger variables Awuvor Buv , which is clearly infeasible. The proposed FSC is a local search algorithm that constructs a feasible solution based on a given infeasible solution by determining feasible routings, CR assignments, and flow rates for all sessions in the network. Our proposed FSC algorithm is a polynomial-time algorithm that consists of three phases: Path Determination, CR Assignment, and Flow Re-calculation. In the following, we give details on these three phases. Phase 1: Path Determination. The goal of this first phase is to find a feasible and potentially high capacity path for each session. In this phase, FSC starts by assuming no prior paths exist for any session in the network. Among the sessions whose paths are yet to be determined, the algorithm performs path determination for a session (chosen at random) iteratively. When determining the next-hop node, FSC takes the following approach. Suppose that we are searching the next hop node for a node ri. In the relaxed solution, it is possible that ri may have multiple next-hop nodes. Here, among these
candidate next-hop nodes, we select the node rj to whichs0d015 8 r0 r17 s1 8 5 r2 s2 r3 10 2 8 10 15 15 7Fig. 2. A simple path between a source and its destination where intermediate nodes are all in N R iis transmitting the largest amount of data (in the relaxed solution) for source node si . This widest pipe approach,although heuristic, has the potential of finding a high capacity path. Note that once a node is included in a path, it will not be considered for inclusion in the other paths during subsequent iterations. Case 1 Simple Path We consider the simple case first,where after the widest-pipe approach, the intermediate nodes between the source and destination nodes of a session are all from the set Nr. An example is shown in Fig. 2, where the final path between s 0 and d0 will go through r 1 ,r2 and r 3 , with r 1 , r 2 and r 3 all being in set Nr. In this case, Phase 1 for the selected session is considered complete, and the algorithm will move on to Phase 2 for the selected session. Case 2 Overlapping Path In this case, based on the widest-pipe approach, we have encountered an intermediate node from the set N s or N d , i.e., a source or destination node. So the path under consideration may overlap with the path of another session. This is the most complex situation that we need to deal with in the path determination phase. Depending on the type of this intermediate node (source or destination),different mechanisms need to be devised. Sub-case 2.1: The encountered intermediate node is the source node of another session. In this case, the encountered intermediate node (say sj) is included in the path as the next-hop node. At the same time, s j is recorded in a special list (denoted as L) to keep track of such source nodes of other sessions that have not yet found their own paths to their corresponding destination nodes but are included in the path during the current iteration. Note that the source node s I
forthe path under construction is not listed in L. Sub-case 2.2: The encountered intermediate node is a destination node. This includes a number of scenarios. A. This node is the destination node of a source node in L: In this case, this node will be included in the current path under construction. Its corresponding source node will be removed from L, since the path for that source node is complete. An an example, in Fig. 3(a), the path under construction is for source node s 0 . Currently, node d 1 is considered as the next hop node along the path and the corresponding source node s 1 is in list L. In this case, D 1 is added to the path and s 1 is removed from list L. B. This node is the destination node of the current path under construction. In this case, this node is included in the path and the path construction for the intended SHARMA et al.: JOINT FLOW ROUTING AND RELAY NODE ASSIGNMENT IN COOPERATIVE MULTI-HOP NETWORKS 259 s0 s1 d1 10 8 r0 r1 r2 (a) Case 2.2(A): The source of the encountered destination node is in L.s0 d0 15 8 r0 s1 7 r38 5 r2 r1 s2 20 5 15 20 7 5 5 20 r4 5 5 10 r5 (b) Case 2.2(B): The encountered node is the destination node of the current path under construction. s0 s1 d2 10 8r0 r1 r2 s2 r3 r4 (c) Case 2.2(C): Source node for d 2 is not on the path under construction. Fig. 3. Examples illustrating different scenarios in Sub-case 2.2 during Phase 1 of FSC. source node is complete. If list L is empty, the current iteration finishes and the algorithm moves on to the next iteration for the remaining nodes. But list L may not be empty at this point, meaning that some source nodes of other overlapping paths still do not have a complete path to their corresponding destination nodes. As an example, in Fig. 3(b), the source node of the path under
construction is s 0 . Source nodes s 1 and s2 are included in the path and thus are in L. When the destination node d 0 (for s 0) is included in the path, the path construction for s0 is complete. But the paths for s 1 and s 2 remain incomplete. If L is not empty, the iteration continues by taking a source node from L as the current intended source node and continues path construction for this node. In our algorithm, if there are multiple source nodes in L, we pick the source node that has the largest share of out-going flow at the current encountered node. Once chosen, this source node is removed from L and the path construction continues. In the case that the current encountered node is not carrying any flow for any of the source nodes in L, then all source nodes in L will be removed from the current path as well as from list L. This is done by removing a source node from the current path (which was in list L) and directly connecting its preceding and succeeding nodes in the path. For example, in Fig. 3(b), the resulting path by removing s 1 and s 2 will be s 0 r 2 r 4-d 0 . At this point, list L is empty and the current iteration finishes. The algorithm will move on to the next iteration to examine the remaining nodes. C. This node is the destination node whose source node is not on the current path under construction. In this case, this node must not be included in the path and the node receiving the nextlargest flow will be considered. This scenario is illustrated in Fig. 3(c). In Fig. 3(c), when d2 is considered for the next node of r 1 , then d2 will not be included in the path because s2is not on the current path. As a result, another node r2will be considered and included as the next node for r1This will ensure that a different path can be explored fors2 in a future iteration. Upon completion of an iteration of path determination, list L must be empty. The algorithm will then move on to the next iteration of path determination for the remaining source nodes
whose paths are not yet determined. Note that all the source nodes, destination nodes, and relay nodes that are already included in a path are removed from further consideration, due to the physical layer constraint we discussed earlier. The iteration continues until all the paths for all the sourcenodes are determined.
V. CONCLUSIONS
In this paper, we explored CC in multihop wireless net-works by studying a joint relay node assignment and multi-hop flow routing problem. This optimization problem is inherently difficult due to its mixed-integer nature and very large solution space. We developed an efficient solution procedure based on a branch-and-bound framework augmented with a cutting plane algorithm that has several novel components to speed up computations. Our results demonstrated the significant rate gains that can be achieved by incorporating CC in a multi-hop wireless network.
ACKNOWLEDGEMENTS
The work of Y.T. Hou, H.D. Sherali, and S.F. Midkiff was supported in part by NSF under grant CNS-1064953. The work of S. Kompella was supported in part by the ONR.
[1] M.S. Bazaraa, H.D. Sherali, and C.M. Shetty, Nonlinear programming: Theory and algorithms, Wiley, New York, 2006. [2] J. Cai, S. Shen, J. W. Mark, and A. S. Alfa, Semi-distributed userrelaying algorithm for amplify-and-forward wireless relay networks, IEEE Trans. Wireless Commun., vol. 7, no. 4, pp. 13481357, April2008. [3] M.R. Garey and D.S. Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness, W.H. Freeman and Company, New York, 1979. [4] D. Gunduz and E. Erkip, Opportunistic cooperation by dynamic resource allocation,
REFERENCES
IEEE Trans. Wireless Commun., vol. 6, no. 4,pp. 14461454, April 2007. [5] O. Gurewitz, A. de Baynast, and E.W. Knightly, Cooperative strategies and achievable rate for tree networks with optimal spatial reuse, IEEE Trans. Inf. Theory, vol. 53, no. 10, pp. 3596 3614, October 2007. [6] T.E. Hunter and A. Nosratinia, Diversity through coded cooperation, IEEE Trans. Wireless Commun., vol. 5, no. 2, pp. 283289, February 2006. [7] G. Jakllari, S.V. Krishnamurthy, M. Faloutsos, P.V. Krishnamurthy, and O. Ercetin, A cross-layer framework for exploiting virtual MISO links in mobile ad hoc networks, IEEE Trans. Mobile Comput., vol. 6, no. 5, pp. 579594, June 2007. [8] A.E. Khandani, J. Abounadi, E. Modiano, and L. Zheng, Cooperative routing in static wireless networks, IEEE Trans. Commun., vo. 55, no. 11, pp. 21852192, November 2007. [9] J.N. Laneman, D.N.C. Tse, and G.W. Wornell, Cooperative diversity in wireless networks: Efficient protocols and outage behavior, IEEE Trans. Inf. Theory, vol. 50, no. 12, pp. 30623080, December 2004. [10] F. Li, K. Wu, and A. Lippman, Energyefficient cooperative routing in multi-hop wireless ad hoc networks, In Proc. 25th IEEE International
Performance, Computing, and Communications Conference, pp. 215 222, Phoenix, AZ, April 1012, 2006. [11] Y. Pochet and L.A. Wolsey, Production Planning by Mixed Integer Programming, Springer, New York, 2006. [12] S. Lakshmanan and R. Sivakumar, Diversity routing for multi-hop wireless networks with cooperative transmissions, In Proc. IEEE SECON, pp. 610618, Rome, Italy, June 2226, 2009. [13] G.L. Nemhauser and L.A. Wolsey, Integer and Combinatorial Optimiza- tion, Wiley, New York, 1999. [14] S. Savazzi and U. Spagnolini, Energy aware power allocation strategies for multihop-cooperative transmission schemes, IEEE J. Sel. AreasCommun., vol. 25, no. 2, pp. 318327, February 2007. [15] A. Scaglione, D.L. Goeckel, and J.N. Laneman, Cooperative communications in mobile ad hoc networks, IEEE Signal Processing Mag., vol. 23, no. 5, pp. 1829, September 2006. [16] A. Sendonaris, E. Erkip, and B. Aazhang, User cooperation diversity part I: system description, IEEE Trans. Commun., vol. 51, no. 11, pp. 19271938, November 2003. [17] A. Sendonaris, E. Erkip, and B. Aazhang, User cooperation diversity part II: implementation aspects and performance analysis, IEEE Trans. Commun., vol. 51, no. 11, pp. 19391948, November 2003.
Performance evaluation of two clustering schemes in the discovery of services in MANET environment
Dr. E. ILAVARASAN 1 , S. PRASATH SIVASUBRAMANIAN 2 1 PG Student, Pondicherry Engineering College, Puducherry 2 PG Student, Avvaiyar Govt. College for Women , Karaikal
Absract
Adhoc networks are basically an infrastructure less environment which operates in an energy constrained setup. SOA is an architectural style that supports integration of business processes as linked services, which may be accessed whenever needed over a network. Services are basically an abstraction of functionality, which can convey its full operations to the end users when it is called. The process of discovery of services in a infrastructure less environment like Adhoc networks constantly poses a challenge for the researchers. Since an adhoc network enforces ubiquity in computing environment, service discovery is critical. Moreover the dynamic nature of adhoc network environment necessitates a customized situation-specific service. This paper addresses the above issues using the clustering of nodes. The discovery of services is done taking two well known clustering schemes: MobDhop and (,t) Cluster Framework and the efficiency of service discovery process is evaluated. 1. INTRODUCTION
The present scenario of network computing is making a constant shift from the wired to the wireless environment. This shift not only enforces innovation but also creates lot of research challenges. Moreover, when the wireless environment is a infrastructure less setup like Adhoc Networks, the complexity of these challenges doubles. MANETS are such a type of network where there lies a constant challenge in handling the dynamic mobility of nodes and also the energy constrained operating environment. Due to the limited energy sources, researchers strive to find solutions for handling this issue. This paper addresses this issue with the help of service oriented computing. When services are implemented in MANETS, due to the loosely coupled nature of services, the nodes need not have a to carry all the code with them, rather discover it as service as and when needed. While attempting to make a service discovery process efficiently in MANET environment, node mobility of MANET enforces a great challenge, hence, it is handled by the clustering schemes that are most commonly used in the adhoc networks. Though there are various clustering schemes that are followed in MANET environment, this paper addresses two well known clustering schemes: i) MobDhop and ii) (,t) Cluster Framework [1]- and the discovery of services through these clustering schemes are evaluated with various parameters.
2. SERVICE DISCOVERY PROCESS

Discovery is a process by which a user finds a service provider. The service discovery which plays a primary role in service orientation carries its responsibility by formulating the request, matching requests to services with similar descriptions and ultimately communicating the same to the service provider. The effectiveness of discovery of service is determined by the matching efficiency of the request with advertisements based on semantics thereby increasing the opportunities for the user to
select an appropriate service[2]. The information necessary for using a specific service is gathered in the service discovery process. Service discovery comprises at least one of the following items: locating the service provider; acquiring additional service or provider information; retrieving the provider's access interface (proxy, stub, etc.)[3]. In active mode, a client initiates a request response procedure by broadcasting a request for a certain service. Appropriate providers respond at least with the location data of their service; Passive mode, in contrast, releases clients from inquiry and obliges providers to announce their services; both mediated modes work with central information brokers. For proper operation providers have to register their service data with a broker. The critical part of the discovery mechanism is the algorithm that matches requests to advertisements. Matching can be done in two forms: Exact matching and Inexact matching[4]. The clients must be more flexible in requesting and using services. Exact matching can return the first match and terminate. Inexact matching needs not only to verify the relationship between the advertised value and the acceptable range, but it also needs to compare matches to all other available matches. When a service handle is returned, it is paired with a service identifier. This can be used by a client to access the service in the future, provided it is within communication range.
the gateway nodes communication.
for
inter
cluster
Fig 1: Clustering in MANETS
3.1. Cluster Formation A set of S nodes can form a cluster if i, node j S and i j, there is always a path Pij from nodei to nodejsuch that k holds that nodek ij[6]. 3.2 Various Clustering Schemes MeenuChawla et al. presented various clustering solutions for MANETS and we have addressed some of the clustering schemes based on their observations: a. Lowest ID Clustering In this scheme every node will carry a node ID and will periodically broadcast their IDs to neighboring nodes[7]. The nodes then check the ID values it receives from its neighbors and the node with the lowest ID becomes the head. Since no parameter is given for the selection of cluster head, this performance of this algorithm is said to be random. b. Highest ID Clustering This method [8,9] takes the degree of a node, which is the number of its one-hop neighbors. Each node periodically broadcasts its degree value. A node with the highest value of degree in its neighborhood is selected as the cluster
Clustering is a process of grouping one or more adhoc nodes to form separate groups with cluster heads, gateway nodes and ordinary nodes.[5] The clustering process is shown in figure1. Cluster heads takes the responsibility of intra cluster communication and it will co-ordinate with
3. SURVEY OF DIFFERENT CLUSTERING SCHEMES IN MANETS
head and its neighbors join it as cluster members. c. Distributed and Mobility adapted Clustering Based on the mobility nature of a node, weights are assigned to determine which node will be the head [10]. When two cluster head happens to be a neighbor, the cluster head selection is resolved based on the higher node weight. Due to the mobile nature of nodes in a MANET environment none of these clustering scheme can be ideal while implementing the service discovery process. Hence we rely on two more clustering schemes namely i) MobDhop and ii)(,t) Cluster Framework. d. MobiDhop Clustering Algorithm In any clustering mechanism, the efficiency of that mechanism is determined by the way how the mobility of the nodes needs to be tracked. This is done using the MobDhop algorithm[11]. This algorithm computes the following parameters: the estimated distance between nodes, the relative mobility between nodes, the variation of estimated distance over time, the local stability, and the estimated mean distance. With this parameters a node can determine its closeness of neighbors, since it is provided with a measure of the strength of the signal it has received. Unlike other algorithms, this algorithm calculates a relative mobility instead of physical distance of two nodes. Relative mobility is a measure of difference of estimated distance of one node over other at two successive time moments. This indicates that two nodes either move away or come closer to each other. MobiDhop takes a three stage execution process: Discovery Stage: This stage is used to form the two hop clusters. Here the nodes are made to exchange primitives for the formation and once the cluster is formed, the nodes will try to acquire the complete knowledge of their neighboring nodes. The local stability value is computed and is transmitted to all nodes in the cluster. Nodes with the lowest stability value
will be the group head and if a node can hear messages from a node that belongs to other cluster it is the gateway node. Merging stage: This stage enables two clusters to merge into a single cluster based on the following conditions: the estimated distance between two merging nodes should be less than or equal to a minimum value of group stability of the two clusters. Secondly, the mean distance should be less than or equal to the higher value of estimated distance of the two clusters. Cluster maintenance: Whenever a node moves out of the cluster or a new node enter a cluster there will be a topology change. This will be handled in this stage. When a cluster head moves out, a new cluster head is elected as before or the cluster without cluster head will be merged with a nearest cluster based on the merging criterion. e. (,t) Cluster Framework Algorithm The primary objective of this clustering algorithm is to handle both the low mobility and high mobility of the nodes in MANET environment[11]. When there is low mobility it will go for an efficient routing and when the mobility is high then it will go for optimal routing. The probability of link and path availability as a function of a random walk based mobility model is taken as a primary norm for this clustering scheme[12].In the ( , t) approach it is attempted to provide an effective topology that adapts to node mobility. Path availability is a random process which is determined by the mobility of nodes that lie along a certain path. In the ( , t) approach paths are evaluated by two system parameters, and t. establishes a lower bound on the probability that a given cluster path will remain available for a time t. controls cluster stability while the role of t is to manage cluster size for a given level of stability[1]. Whenever there is a change in the topology, the nodes will revaluate their (, t) values which is most ideal when service discovery process is to
done in a much challenging dynamic mobile environment. This algorithm addresses five different topological changes say node activation, link activation, link failure, node deactivation and timer expiration. Since node stability can virtually be simulated for the dynamic environment, this strategy is considered to be an ideal setup for the discovery of services and is taken for evaluation by us in this paper.
4. SYSTEM DESIGN
When a cluster is formed the cluster head needs to handle the service request it receives from its peer nodes. The setup takes the help of a Discovery Manager (DM) that is implemented as a middleware. The discovery manger inturn uses the Route Manager(RM) for propagating the service requests. In order to handle the situation, the DM uses a simple table that carries the details of service description, location, minimum hopcount, protocol used etc.[13] Our architecture follows a variant where whenever the peer nodes raises the request for services, instead of the DM to handle the request, the cluster head itself is provided the option of discovering the service from the table it posses, whether the service is located within the cluster. If not found the request will be forwarded to the neighboring clusters. This provides an additional advantage that another middleware stack complication can be avoided since cluster head takes the responsibility of discovering the service. When ever a nodes moves outside the range of a particular cluster then it becomes in accessible which enforces a challenge in every service discovery process.Since the clustering of nodes is done in a dynamic way, this challenge can be easily handled for a efficient discovery of services.
MANETS is done after simulating and evaluating the results through NS2[14] simulator. The simulation is done for the Success rate of Service Discovery and the End to end delay of the discovery of services. The following table depicts the parameters used for simulation: No. of nodes 300 Area 1000 x 1000 MAC 802.11 Time 200 sec Traffic CBR Size 512 Transmission 300m Range The results of the simulation is shown in the following graphs:
5. SIMULATIONS AND RESULT EVALUATION

The evaluation process of these two clustering schemes in the service discovery process of
Fig 2: Discovery ratio of services The above graph shows the Discovery success rate of two clustering schemes. It is seen that when the number of services increases, (, t) cluster framework algorithm works better to compared to MobDhop. This is because there is node stability factor that is preserved in the (, t)Cluster frame work algorithm which is not the case of MobDhop. The following graph in Fig3. Shows the end-to- end delay of the discovery of services in MANET environment using the two clustering schemes. Again the end-to-end delay in MobDhop is showing a increase in the scale whereas for (, t) cluster framework algorithm there is no considerable increase in the delay as is seen in the previous case.
Fig 3: End-to-end delay in discovery
6. CONCLUSION
The service discovery process in MANETS is done through two well known clustering scheme say MobDhop and (, t) Cluster framework algorithm. It is found while discovering a service the success rate of discovery is more in (, t) cluster framework algorithm than in MobDhop algorithm. Moreover the end-to-end delay is also better in (, t) Cluster framework algorithm rather than in MobDhop clustering algorithm.
Acknowledgements
The authors acknowledge all persons and resources who have directly and indirectly influence for the outcome of this contribution.
REFEREENCES
1. MeenuChawla, JyotiSinghaiand J L Rana "Clustering in Mobile Ad hoc Networks: A Review - International Journal of Computer Science and Information Security vol.8 2010 2. GolandY.Cai,T.Leach,P&Gu,Y. (1998 April). Simple Service discovery protocol. Retrieved 15, 2004. 3.RaduHandorean and GruiaCatalin Roman. Service Provision in Ad Hoc Networks.Proceedings of the 5th International Conference on Coordination Models 4. Golden Richard III, Service and Device discovery, ISBN 0-071-37959-2, 2002 5. Dali Wei and H Anthony Chan "An Efficient Clustering Algorithm for Topology Maintenance and Energy Saving in MANETs" - South Africa
Telecommunication Networks and Application conference 2006 6. K. Manousakis, J.S Baras "Improving the speed of dynamic cluster formation in manet via simulated annealing" Proceedings for the Army science Congress- Orlando Florida - 2005 7. M. Gerla and J. T.-C. Tsai, Multicluster, mobile, multimedia radio network, ACMBaltzer J. Wireless Networks journal 95,, vol. 1, no. 3,oct 1995, pp. 255265. 8 A. Ephremides, J.E. Wieselthier, D.J. Baker. A design concept forreliable mobile radio networks with frequency hoping signaling. Proc.IEEE 75. 1987. pp. 56-73. 9. A. Parekh. Selecting routers in ad hoc wireless networks.Proceedingsof the SBT/IEEE International Telecommunications Symposium. 1994. 10. S. Basagni. Distributed clustering for ad hoc networks. Proc. ISPAN?99 Int. Symp. On Parallel Architectures, Algorithms, and Networks. 1999. pp. 310-315. 11. B. McDonald and T. E Znati, A mobilitybased framework for adaptive clustering in wireless ad hoc networks, IEEE JSAC, Vol. 17, No. 8, August 1999. 12. A. B. McDonald, T. F. Znati. Design and simulation of a distributed dynamic clustering algorithm formultimode routing in wireless ad hoc networks.SIMULATION.Vol. 78. 2002. pp. 408-422. 13. S.Karunakaran, P.Thangaraj, "A ClusterBased Service Discovery Protocol for Mobile Ad-hoc Networks" American Journal of Scientific Research - ISSN 1450-223X Issue 11(2010) 14. NS2 Manual
Survey of TCP Adaption in Mobile Ad-Hoc Networks

Christina .J 1, Kanchana .A 2, Revathy.V 3, Imthyaz Sheriff 4 Department of Computer Science and Engineering, Easwari engineering college
Abstract
This paper presents an overview of how TCP can be adapted to mobile Ad hoc Networks.TCP was originally designed to provide reliable end to end delivery. To adapt TCP over the mobile Adhoc Networks, it is necessary to classify and identify the losses. Network congestion is not the only reason for loss in Ad hoc Networks.TCP can be affected by the mobility of the nodes, route failures, wireless channel contention, unfairness. This paper compares and examines the main approaches which would help in adapting TCP to mobile Ad hoc networks environment. Keywords: Mobilenetworks,Adhoc Networks,TCP survey, packet loss,route failure Introduction
Ad hoc networks are complex distributed systems that consist of wireless mobile or static nodes that can freely and dynamically self-organize. In this way they form arbitrary and temporary, ad hoc network topologies, allowing devices to seamlessly interconnect in areas with no pre-existing infrastructure. The Transmission Control Protocol (TCP) was designed to provide reliable end-to-end delivery of data over unreliable networks. In practice, most TCP deployments have been carefully designed in the context of wired networks. Ignoring the properties of wireless mobile ad hoc networks can lead to TCP implementations with poor performance. In order to adapt TCP to the ad hoc environment, improvements have been proposed in the literature to help TCP to differentiate between the different types of losses. Problems of TCP in Ad hoc Networks: TCP is a connection-oriented transport layer protocol thatprovides reliable, in-order delivery of data to the TCP receiver. Usage of TCP without any modification to suit mobile ad hocnetworks, results in a serious drop in connections throughput. There are several reasons for such a drastic drop in TCP throughput and below sections examines these reasons in brief. Effect of a High BER: Bit errors causes the packets to get corruptedwhich result in lost TCP data segments or acknowledgment. When acknowledgment do not arrive at the TCP senderwithin a short amount of time [the retransmit timeout (RTO)],the sender retransmits the segment, exponentially backs off itsretransmit timer for the next retransmission, reduces its congestion control window threshold, and closes its congestion window to one segment. Repeated errors will ensure that the congestion window at the sender remains small resulting in low throughput [22]. It is important to note that error correction may be used to combat high BER but it will waste valuable wireless bandwidth when correction is not necessary.
Effect of Route Recomputations: When an old route is nolonger available, the network layer at the sender attempts to find a new route to the destination [in dynamic source routing (DSR) [21] this is done via route discovery messages while in destination-sequenced distance-vectoring (DSDV) [23] table exchanges are triggered that eventuallyresult in a new route being found]. It is possible that discoveringa new route may take significantly longer than the RTO at the sender. As a result, the TCP sender times out, retransmits a packet, and invokes congestion control. Thus, when a new route is discovered, the throughput will continue to be small for some time because TCP at the sender grows its congestion window using the slow start and congestion avoidance algorithm. This is clearly undesirable behavior because the TCP connection will be very inefficient. If we imagine a network in which route computations are done frequently (due to high node mobility), the TCP connection will never get an opportunity to transmit at the maximum negotiated rate (i.e., the congestion window will always be significantly smaller than the advertised window sizefrom the receiver). Effect of Network Partitions: It is likely that the mobile ad hocnetwork may periodically get partitioned for several seconds at a time. If the sender and the receiver of a TCP connection lie in different partitions, all the senders packets get dropped by the network resulting in the sender invoking congestion control. If the partition lasts for a significant amount of time (say, several times longer than the RTO), the situation gets even worse because of serial timeouts. A serial timeoutis a condition wherein multiple consecutive retransmissions of the same segment are transmitted to the receiver while it is disconnected from the sender. All these retransmissions are, thus,lost. The following are the major problems which reduce the performance of TCP
TCP is unable to distinguish between losses due to route failures and losses due to network congestion. TCP suffers from frequent route failures. The contention on the wireless channel. TCP unfairness. TCP performance in such networks suffers from significant throughput degradation and very high interactive delays
Related work
TCP performance over MANETs: Monks et al. [2], investigated the impact of mobility on TCP throughput in MANETs. In their simulationscenarios, nodes move according to the random way-point model. Byanalyzing the simulations trace of patterns of low throughput, they found that the TCP senders routing protocolis unable to quickly recognize and purge stale routes from its cache, which results in repeated routing failuresand TCP retransmission timeouts. For patterns of high throughput they found that most of time the TCP sender and receiver are close to each other. By examining the mobility patterns, the authors observe that as the senderand receiver move closer to each other, Dynamic source routing protocol(DSR)[21] can maintain a valid route. This is done by shortening the existingroute before a routing failure occurs. However, as the sender and receiver move away from each other, DSR waitsuntil a failure occurs to lengthen a route. To prevent TCP invocation ofcongestion control that deteriorates TCP throughput in case of losses induced by mobility, Monks et al. suggestthe usage of explicit link failure notification (ELFN) technique. TCP treats losses induced by route failures assigns of network congestion. There are a set of factors that contribute to the degradationof TCP throughput in the presence of mobility. These factors are: MAC failure detection and route computationlatencies. The MAC failure detection latency is defined as the amount of time spent before the MAC concludes alink
failure. They found that in the case of the IEEE 802.11 protocol, when the load is light this latency is small and independent of the speed of the nodes. However in case of high load, the value of thislatency is magnified and becomes a function of the nodes speed. The route computation latency is defined as thetime taken to recompute the route after a link failure. Also, the authors identify another problem, called MAC packet arrival that is related to routing protocols.In fact, when a link failure is detected, the link failure is sent to the routing agent of the packet that triggered thedetection. If other sources are using the same link in the path to their destinations, the node that detects the linkfailure has to wait till it receives a packet from these sources before they are informed of the link failure. This alsocontributes to the delay after which a source realizes that a path is broken.
ensure the sourceto destination reliability, an ACK is sent by the destination to the source similarly to the standard TCP. In fact, this scheme splits also the transport layer functionalities into those end-to-end reliability and congestion control. This is done by using two transmission windows at the source which are the congestion window and the endtoendwindow. The congestion window is a sub-window of the end-to-end window. While the congestion windowchanges in accordance with the rate of arrival of LACKs from the next proxy, the end-to-end window will changein accordance with the rate of arrival of the end-to-end ACKs from the destination. At each proxy, there would be a congestion window that would govern the rate of sending between proxies.
Proposals for TCP Adaptation

A.Proposals to reduce route failures The following proposals address the problem of frequent route failures in MANETs. Split TCP: TCP connections that have large number of hops suffer from frequent route failures due to mobility.To improve the throughput of these connections and to resolve the unfairness problem, the Split TCP[24] schemewas introduced to split long TCP connections into shorter localized segments The interfacingnode between two localized segments is called a proxy. The routing agent decides if its node has the role of proxyaccording to the inter-proxy distance parameter. The proxy intercepts TCP packets, buffers them, and acknowledges theirreceipt to the source (or previous proxy) by sending a local acknowledgment (LACK). Also, a proxy is responsiblefor delivering the packets, at an appropriate rate, to the next local segment. Upon the receipt of a LACK (fromthe next proxy or from the final destination), a proxy will purge the packet from its buffer. To Detect using signal strength based link management: Two mechanisms for alleviating the effects of mobility on TCP performance are proposed by Fabius Klemm[8]. They are the Proactive and the Reactive Link Management (LM) schemes. These schemes are implementedat the MAC layer. We also provide a modification of AODV[24] at the network layer that can exploit thepresence of the link management schemes. Proactive LM tries to predict link breakage, whereas ReactiveLM temporarily keeps a broken link alive with higher transmission power to salvage packets in transit. In this method they proactively determine the signal strength and if its below the threshold they reroute the packets thus avoiding the link failures. MultiPath Routing:Analternative way of utilizingmultipath routing was proposed by H.
Limet al[9]. That is, to let TCP only use one path atatime and keep the other paths as backup routes, whichwe refer as backup path multipath routing. Backup pathmultipath routing still maintains several paths from a sourceto a destination. However, it only uses one path at any time. When current path breaks, it can quickly switch to other alternative paths.Ad htoc On-demand Multipath distance vector protocol(AODV) with backup routing(BR) and guarenteed bandwidth routingis compared
TCP-F ELFN ATCP Route failures RFN pkt ELFN pkt ICMP (RF) freezes freezes destination Detection TCP TCP unreachable sender sender freezes TCP state state sender state Route RRN pkt Probing Probing reconstruction resumes mechanism mechanism Packet Not Not Handled reordering handled handled Congestion Old CW Old CW Reset for window and and each new and RTO RTO RTO route after RR Evaluation Emulation Simulation Experimental no routing no routing protocol protocol considered considered the third duplicate ACK but puts TCP in the persist state and quickly retransmits the lost packet from TCPs buffer.[12] TCP-BuS:TCP Buffering capability and Sequence information[13] uses network feedback to detect route failure events and to take convenient action in response to these events. The novel scheme in this proposal is the introduction of buffering capabilityin mobile nodes
TCP-BuS ERDN pkt freezes TCP sender state ERSN pkt resumes Not handled Old CW and RTO Simulation
B.Proposals to classify the packet loss: TCP-F: It isa feedback-based approach[10] to handle route failures in MANETs. A separate route failure notification packet (RFN) is sent to indicate the route failure. On receiving the RFN, the source goes into a snooze state. ELFN-based Technique: This interaction aims to inform the TCP agent about route failures when they occur. The authors use an ELFN [11] message, which is piggybacked onto the route failure message sent by the routing protocol to the sender. ATCP: To detect packet losses due to channel errors, ATCP[14] monitors the received ACKs. When ATCP sees that three duplicate ACKs have been received, it does not forward
C.Proposals to improve TCP fairness 1)Enhanced RED: TCP unfairness in mobile adhoc wireless networks has been reported during the past several years. This unfairness results from the nature of the shared wireless medium and location dependency. A node and its interfering nodes to form a neighborhood, and the aggregate of local queues at these nodes represents the distributed queue for this neighborhood. However, this queue is not a FIFO queue. Flows sharing the queue have different, dynamically changing priorities determined by the topology and traffic patterns. Thus, they get different feedback in terms of packet loss
rate and packet delay when congestion occurs. In wired networks, the Randomly Early Detection (RED) scheme was found to improve TCP fairness. In this paper, we show that the RED scheme does not work when running on individual queues in wireless nodes. We then propose a Neighborhood RED[15](NRED) scheme, which extends the RED concept to the distributed neighborhood queue. Simulation studies confirm that the NRED scheme can improve TCP unfairness substantially in mobile ad hoc networks. Moreover, the NRED scheme acts at the network level, without MAC protocol modifications. This considerably simplifies its deployment. 2) Adaptive Pause : In wireless mobile ad hoc networks, fair allocation of bandwidth among different TCP flows is one of the critical problems that affect the performance of the entire system. A fairness mechanism called Adaptive_Pause.[16] is a simple and distributed scheme that only needs little communication and processing overhead. Each node monitors the occupation of the channel due to its emissions and dynamically determines whether it should pause a time interval in order to avoid channel capture. Comparing to other passive schemes in which a node limits its transmission when it receives congestion indication from its neighbors, the proposed scheme is more effective and requires less overhead. Simulation result validated the analysis result and gave the optimal parameter setting. Both analytic and simulation results show that Adaptive_Pause scheme can improve the TCP fairness 3) Cross layer approach to Enhance TCP Fairness:A Cross layer approach to Enhance TCP FairnessbyCamarda, P et al[17] exploits the advertisedwindow field (adw) of TCP segments to limit the transmissionrate of the TCP sender.The very popular 802.11 standard enables wireless ad-hoc networking, using the distributed coordination function (DCF) for multiple access to the shared radio channel.
Unfortunately, the interaction between TCP dynamics, driven by the additive increase multiplicative decrease (AIMD) paradigm, and DCF channel access rules, which are based on the carrier sense multiple access with collision avoidance (CSMA/CA) algorithm, leads to an inefficient spatial channel usage. As a consequence, 802.11 ad-hoc networks provide an unfair service to TCP flows. In order to achieve a reasonable trade-off between throughput and fairness.The cross layer approach dynamically limits the number of in flight segments in a TCP connection by taking into account measurements of frame collision probability, collected at the MAC layer along the path. D.Proposals to reduce wireless channel contention: 1)Rate control based on channel utilization and contention:In mobile ad hoc networks, both contention and congestion can severely affect the performance of TCP.Nana Li et al[18] shows that the over-injection of conventional TCP window mechanism results in severe contentions, and medium contentions cause network congestion. Furthermore, introducing two metrics, channel utilization (CU) and contention ratio (CR), we characterize the network status. Then, based on these two metrics, a new TCP transmission rate controlmechanism based on Channel utilization and Contention ratio (TCPCC) was proposed. In this mechanism, each node collects the information about the network busy status and determines the CU and CR accordingly. The CU and CR values fed back through ACK are ultimately determined by the bottleneck node along the flow. The TCP sender controls its transmission rate based on the feedback information. 2)Dynamic delayed Ack: This approach [19] aims to reduce the contention on wireless channel, by decreasingthe number of TCP ACKs transmitted by the sink. It is a modification of the delayed ACK option (RFC
1122) thathas a fixed coefficient d = 2. In fact, d represents the number of TCP packets that the TCP sink should receivebefore it acknowledges these packets. In this approach, the value of d is not fixed and it varies dynamically withthe sequence number of the TCP packet. For this reason, the authors define three thresholds l1, l2, and l3 suchthat d = 1 for packets with sequence number N smaller than l1, d = 2 for packets with l1 N l2, d = 3 forl2 N l3 and d = 4 for l3 N. In their simulations, they study the packet loss rate, throughput, and sessiondelay of TCP New Reno, in the case of short and persistent TCP sessions on a static multihop chain. They showthat their proposal, with l1 = 2, l2 = 5, and l3 = 9, outperforms the standard TCP as well as the delayed ACKoption for a fixed coefficient d = 2; 3; 4. They suggest that better performance could be obtained by making d afunction of the senders congestion window instead of a function of the sequence number. 3)COPAS: COntention-based PAth Selection proposal [19] addresses the TCP performance drop problem due tothe contention on the wireless channel. It implements two techniques: the first one is disjoint forward and reverseroutes, which consists of selecting disjoint routes for TCP data and TCP ACK packets. The second one is dynamicContention-balancing, which consists of dynamically update disjoint routes. Once the contention of a route exceeds acertain threshold, called backoff threshold, a new and less contented route is selected to replace the high contendedroute. In this proposal, the contention on wireless channel is measured as a function of the number of times that anode has backed off during each interval of time. Also at any time a route is broken, in addition to initiating a routere-establishment procedure, COPAS redirects TCP packets using the second alternate route. Comparing COPASand DSR, the authors found that COPAS outperforms DSR in term of TCP throughput and routing overheads.
The improvement of TCP throughput is up to 90%. However, the use of COPAS, as reported by the authors, islimited to static networks or networks with low mobility. Because, as nodes move faster using a disjoint forwardand reverse routes increases the probability of ro. The modified AODV allows the forwarding of packets in transit on a route that is going down whilesimultaneously initiating a search for a new route failures experienced by TCP connections. So, this may inducemore routing overheads and more packet losses. Solution Type Link layer Evaluation Throughput Improvement Simulation 50%-60% improvement over standard TCP
Rate control based on channel utilization and contention Dynamic TCP delayed layer Ack COPAS
Simulation 30%-40% improvement over standard TCP Network Simulation Up to 90% layer
V. Conclusion.
Four proposals are used to address the route failures.The main cause of route failures is node mobility. Split TCP is based on splitting long TCP connections, in term of hops, into short segments to decrease the number of routing failures. However, this generates more overheads. In Preemptive routing and Signal strength based link management, the problem is addressed by predicting link failures and initiating a route reconstruction before thecurrent route breaks. Signal strength based link management proposal uses a more robust approach to predict failures than Preemptive routing. Backup routing improves the TCP
path availability by storing an alternative path. This path is used when a routing failure is detected. Backup routing reports an improvement in TCP throughput of up to 30% with a reduction in routing overheads. But, further evaluations are needed especially for the route selection criteria. In proposals to classify the packet loss, four proposal likeTCP-f,ELFNbasedTechnique,ATCP,TCP-bus on an explicit notication from the network layer to detect the route failures. But, they differ in how to detect route re-establishments In Proposals to improve TCP fairness, Enhanced RED, Adaptive Pause and a Cross layer approach to Enhance TCP Fairness are proposed. In Non work-conserving scheduling proposal, this is done through penalizing greedy nodes of high output rate, by increasing their queuing delays at the link layer. However, in Enhanced RED it is up to TCP to regulate its transmission rate when it senses packets drop.Enhanced RED is better than Non work-conserving as it does not cause degradation in total throughput. In Proposals to reduce wireless channel contention,Rate control based on channel utilization and contention, Dynamic delayed Ack, COPAS are proposed Dynamic delayed ACK is a simple approach that aims to reduce the contention on wireless channel by decreasing the number of TCP ACKs transmitted by the sink. However, COPASattacks this problem by using disjoint forward and reverse routes, and dynamic update of disjoint routes basing one contention level. COPAS reports an improvement in TCP throughput up to 90%.
VI .REFERENCES
[1]Jian Liu and Suresh SinghATCP: TCP for Mobile Ad Hoc Networks in IEEE Journal on selected areas in communicationsJuly 2001..
[2]Gavin Holland and Nitin VaidyaAnalysis of TCP performance over mobile ad hoc networks,ProceedingMobiCom '99 Proceedings of the 5th annual ACM/IEEE international conference on Mobile computing and networking [3] E. Altman and T. Jimenez, Novel delayed ACK techniques for improving TCP performance in multihop wireless networks, inProc.of the Personal Wireless Communications, Venice, Italy, Sep. 2003, pp. 237253. [4] Z. Fu, P. Zerfos, H. Luo, S. Lu, L. Zhang, and M. Gerla, The impact of multihop wireless channel on TCP throughput and loss, inProc. of IEEE INFOCOM, San Francisco, USA, Apr. 2003. [5]S. Xu and T. Saadawi, Does the IEEE 802.11 MAC protocol work well in multihop wireless ad hoc networks?, IEEE CommunicationsMagazine, vol. 39, no. 6, pp. 130137, Jun. 2001. [6] M. Gerla, K. Tang, and R. Bagrodia, TCP performance in wireless multi-hop networks, in Proc. of the IEEE WMCSA, New Orleans,LA, USA, 1999 [7] Mohammad Amin Kheirandish Fard , Kamalrulnizam Abu Bakar,Sasan Karamizadeh, Roozbeh Hojabri Foladizadeh Improve TCP Performance over Mobile Ad Hoc Network by RetransmissionTimeoutAdjustmentin Communication Software and Networks (ICCSN), 2011 IEEE 3rd International Conference on .[8]Fabius Klemm , Zhenqiang Ye_ , Srikanth V. Krishnamurthy_ , Satish K. Tripathi Improving TCP Performance in Ad Hoc Networks using Signal Strength based Link Managementin [9]H. Lim, K. Xu, and M. Gerla, TCP performance over multipath routing in mobile ad hoc networks, in Proc. of IEEE ICC, Anchorage,Alaska, May 2003. [10]K. Chandran, S. Raghunathan, S. Venkatesan, and R. Prakash, A feedback based scheme for improving TCP performance
in Ad-Hoc wireless networks, in Proc. of the International Conference on Distributed Computing Systems (ICDCS98), Amsterdam, Netherlands,May 1998. [11]J. Monks, P. Sinha, and V. Bharghavan, Limitations of TCP-ELFN for ad hoc networks, in Proc. of Mobile and Multimedia Communications, Tokyo, Japan, Oct. 2000. [12]K. Ramakrishnan, S. Floyd, and D. Black, The addition of explicit congestion notification (ECN) to IP, RFC 3168, Category:Standards Track, Sep. 2001. [13]D. Kim, C. Toh, and Y. Choi, TCP-BuS: Improving TCP performance in wireless ad hoc networks, Journal of Communications andNetworks, vol. 3, no. 2, pp. 175186, Jun. 2001. [14]J. Liu and S. Singh, ATCP: TCP for mobile ad hoc networks, IEEE JSAC, vol. 19, no. 7, pp. 13001315, Jul. 2001 [15]K. Xu, M. Gerla, L. Qi, and Y. Shu, Enhancing TCP fairness in ad hoc wireless networks using neighborhood red, in Proc. of ACM MOBICOM, San Diego, CA, USA, Sep. 2003, pp. 1628. [16]Yantai Shu ; Sanadidi, M. ; Gerla, M A Method for Improving the TCP Fairness in Wireless Ad Hoc Networks in Wireless Communications,
Networking and Mobile Computing, 2008. WiCOM '08. 4th International Conference
hopwireless networks, in Proc. of IC3N, Miami, USA, Oct. 2003, pp. 382387. [21] D. Johnson, D. Maltz, and Y. Hu, The Dynamic Source Routing Protocol for Mobile Ad Hoc Networks (DSR), Internet draft, Apr. 2003. [22] Jacobson, V., Congestion avoidance and control, Proceedings of the ACM Symposium on Communications Architectures and Protocols, Vol. 18, No. 4, pp. 314-329, Stanford, CA, USA, August 16-18, 1988. [23]Perkins, C., E., and Bhagwat, P., (1994), Highly dynamic Destination-Sequenced Distance-Vector routing (DSDV) for mobile computers, Proc. of the DIGCOMM 94 Conference on Communications Architectures, Protocols and Applications, pages 234-244, August-1994. [24]S. Kopparty, S. Krishnamurthy, M. Faloutous, and S. Tripathi, Split TCP for mobile ad hoc networks, in Proc. of IEEE GLOBECOM, Taipei, Taiwan, Nov. 2002. Fabius Klemm
[17]
Grieco, L.A. ; Mastrocristino, T. ; A Cross-layer Approach to Enhance TCP Fairness in Wireless Ad-hoc Networks Wireless Communication Systems, 2005. 2nd International Symposium . [18]Nana Li ; Wenbo Zhu ; Dan Sung TCP transmission rate control mechanism based on channel utilization and contention ratio in AD hoc networks in Communications Letters, IEEE. [20] E. Altman and T. Jimenez, Novel delayed ACK techniques for improving TCP performance in multihop wireless networks, in Proc.of the Personal Wireless Communications, Venice, Italy, Sep. 2003, pp. 237253. [19]C. Cordeiro, S. Das, and D. Agrawal, COPAS: Dynamic contention-balancing to enhance the performance of tcp over multiTesoriere
Camarda, P;
A Video Aware FEC based Unequal Loss Protection System for Video Streaming over RTP
Sandeep.R1, Vignesh K.C2, Viswanathan.N3, Balakannan S.P4 Final B.Tech, Department of IT, Anand Institute of Higher Technology, Chennai, India 4 Assistant Professor, Department of IT, Anand Institute of Higher Technology, Chennai, India
1,2,3
Abstract:
Increased speeds of PCs and networks have made media communications possible on the Internet. Today, the need for desktop videoconferencing is experiencing robust growth in both business and consumer markets. Thus the usage of TCP/IP came into existence for this purpose. The TCP/IP basically is a connection oriented protocol. The acknowledgements are basically sent from the receiver and onto the sender. Once the sender sends the packets to the receiver, the receiver periodically notifys the sender with acknowledgements. The Real-time Transport Protocol (RTP) defines a standardized packet format for delivering audio and video over IP networks. RTP was developed by the Audio Transport working group of the Internet Engineering task force, in order to create a standardized protocol for multimedia communication across packet networks. Media can be received from the server to which the camera is attached and live images are captured and media allows receive streams containing any number of tracks from a number of different hosts/transmitters on the network simultaneously. These systems use the TCP/IP protocols for communication and also for transfer of frames from one system to another. These systems follow the store and forward procedure for the transfer of frames. Normally these systems can transmit and re-transmit the copy of frames from the source or server. It uses the socket side programming for data transfer from server to the requested client. In these systems the IP address can be obtained by TCP/IP protocol. So the data are sent to that particular system thereby leading to several drawbacks. Thus in this paper, we present a multiparty videoconferencing system based on Real-time Transport Protocol through LAN (Local Area Network) in order to overcome the existing drawbacks. Keywords: TCP/IP, RTCP, RTP, FEC I. Introduction: The existing system is based on the TCP/IP.TCP stands for Transmission Control Protocol. It is a protocol developed for the internet to get data from one network device to another.TCP uses a retransmission strategy to ensure that data will not be lost during transmission. It is a set of rules used along with the internet protocol to send data in the form of message units between computers over the internet. TCP takes care of keeping track of the individual units of data called packets that a message is divided into for efficient routing through the internet. The systems use the TCP/IP protocols for communication and also for transfer the frames from one system to another. This is
used mainly because of its security purposes. Generally during a video transmission packets are sent and received from the sender and onto the receiver. Thus acknowledgements are sent and received accordingly. Only if the acknowledgements are received by the sender will the next packets be sent to the receiver. Normally these systems can transmit and re-transmit the copy of the frames from the source or server. It uses the socket side programming for data transfer from server to the requested client. So the data are send to that particular system.In this paper, we present a multiparty videoconferencing system based on RTP through LAN.RTP stands for Real-time Transport Protocol .RTP was developed to create a standardized protocol for multimedia communication across packet networks.Media can be received from the Server to which the Camera is attached and live images are captured.Media allows to receive media streams containing any number of tracks from a number of different hosts/transmitters on the network simultaneously. The RTP defines a standardized packet format for delivering audio and video over the IP network. Generally RTP is extensively used for communication and entertainment systems that involve streaming media. One of the major advantages of RTP is that it is a connectionless protocol. Thus there neednt be a need for the wait of the acknowledgements to be received inorder to send the next packet. Likewise the RTCP algorithm is made use of in the RTP inorder to monitor the communication which means that there is less of time delay. Also RTP provides Java Media Framework.
II.PROBLEM STATEMENT A. System Model: Representative network architecture for RTP is showed below. The different network entities can be identified as follows: User: user enters name and password for validation of login page. If the user name and password matches with the database means the response is send to the user with form page for transferring media file.
Server: After the successful validation the server system sends the media file to the multiple clients. The media can be audio and video, The Server system gets the IP address and port address for client system to add target system for communicating with media files. Live videos will be captured using any web cam devices. This in turn read by the JMF and sent for authentication server. Authentication is done by checking the IP address and Port number in the database. The main server transmits the live video to the list of receiver and the transmission is done by frame by frame. The Session will be tracked sequentially. The RTCP monitor controls the frames received in the receiver where RTCP maintains sender report, receiver report. The Transmitter maintains configures information.
(Multicasting). And it is needed to keep track of the session for each media track of the processor so session tracking is essential. One session means opening a web browser and closing it . in such a way there can be multiple sessions available which can be tracked easily during each session before communication. D. Creating Player The output thats being transmitted by the sender system has to be captured and be viewed on the Live panel .It allows to receive media streams containing any number of tracks from a number of different hosts/transmitter on the network simultaneously. On Receipt of RTP data from the different senders, the received data has to be processed separately. We need to create different players for viewing various media files collected at different ports from various senders E. RTP Monitor The RTCP traces RTCP packets such as Sender Reports, Receiver Reports and Sent packets. In case of any packet loss or path collision the tracer used to monitor and detect the proceedings for further successful transaction. F. Storing Previous Configuration When configure the transmission media, that is when we specify the IP address and port number of Transmitting system and Receiving system for a particular transmission session, we need to store the configuration for the later working. So that no need repeatedly specify the IP address and port number. G. GUI Module GUI module is designed to make the user friendlier to the System. It can specify
III.EFFICIENT GROUP DISTRIBUTION RTP PROTOCOL A. Authentication with the Server Every system before starting to transmit or receive Live Media data need to be authenticated by the authenticating server. If the user name and password matches with the database means the response is send to the user with form page for transferring media file. B. RTP Transmit Module In this module, RTP protocol is used. Since it is best effective for live media transmission. The output should be transmitted in JPEG format so that the transmission is made easy and faster. The JMF API is used to read the source and convert it to JPEG/RTP. C.Session Tracking In the previous module the captured live media is transmitted to a specific receiver (Unicasting). Now, the live media is needed to be transmitted to a list of receivers
the port through which the Transmission is to occur, the system to which is going to transmit the Media, port through and to which the data is transmitted etc. Swing is a GUI toolkit for Java. It is one part of the Java Foundation Classes (JFC). Swing includes graphical user interface (GUI) widgets such as text boxes, buttons, split-panes, and tables. Swing widgets provide more sophisticated GUI components than the earlier Abstract Window Toolkit. Since they are written in pure Java, they run the same on all platforms, unlike the AWT which is tied to the underlying platform's windowing system. Swing supports pluggable look and feel not by using the native platform's facilities, but by roughly emulating them. This means you can get any supported look and feel on any platform. The disadvantage of lightweight components is slower execution. The advantage is uniform behavior on all platforms.
such as lost quick synchronization, loss of ability to do short burst. The quality of service increases and thus it enables the throughput to also increase. The is also increase in the speed of transmission of packets without any delay or packet loss compared to the existing system. Our result indicates that when the RTP is used we can get a faster transmission of data over the internet with literally no delay or constraints. Once the connection is established with the server and the client then transmission is easily achieved. Since the RTP is basically used in the intranet environment it is much safer. TCP with sufficient long buffering and adequate average throughput, near real time delivery can be successful in delivering the audio and video packets.TCP generally runs in highly lossy network which makes audio or video communication impossible. But the RTCP packet contains absolute time information. RTCP stands for Real Time Transport Control Protocol and is used to monitor the communication and hence the time delay is less. Likewise it provides JMF which is Java Media Framework. The JMF is a method for working with time based data on java. It handles real time or stored audio and video. V.REFERENCES: [1] Chong LuoA Multiparty Videoconferencing System over an Application-Level Multicast ProtocolThe IEEE Transaction on multimedia, 2007, VOL. 9, NO.8.
[2] Z.Zhahujun, H.Chengde,A RTP-Based
IV.CONCLUSION: By checking with various parameters we can conclude that RTP is much better than the TCP. This is because of factors
Architecture of Multimedia Communication and Networks, Tenth International Conference on Computer Communications and networks, 2001.Proceedings, pp.426431, Oct.2001.
Computer network database attacks Security from Threats And Hackers

1
R.Shankar 1, A.Sankaran 2 Professor/HOD, Department Of Computer Science, Indira Institute of Engineering And Technology 2 PG Student , Department Of Computer Science, Indira Institute of Engineering And Technology
Abstract
The development of computer technology, computer networks play important role in human society, to the social development. It used a variety of database technology, which is giving hackers to provide a means of attack damage. To this, aiming at the computer network database security threats and prevention of related problems are analyzed.
Keywords- Database, SQL, Database explosion, exception handling, security.

I.INTRODUCTION
This huge computer network platform makes the community more and more information. Database technology has played a leading position. Meanwhile, network intrusion aggravating, each country and big companies systems have hacker's patronage, and these large systems are supported by the database. Therefore, the database security has become the object of concern to everyone. And single Intrusion Detection System (IDS) can be installed on a segment or gateway of computer networks to promptly analyze each packet and to isolate normal activities from abnormal activities. IDS users may also install multiple Intrusion Detection Systems (IDSs) at different segments of the same network to safeguard their computer resources. Attacks that target the application level may be different from attacks that target operating system or Database Management Systems (DBMS) of targeted networks. Hence, there are different categories of IDSs. Researchers have shown that different IDSs have different formats for logging network intrusions. The patterns of packets that aim to illegally steal or hijack sensitive information from computer networks are different from flooding attacks that aim to disrupt availability of computer services Thus, this paper critically describes and analyses emerging security challenges Computer network database. The review will be functionally useful to researchers, vendors, security professionals and IT end users in general. II. DATABASE ATTACK MEANS For the invasion of computer network database, the following database explosion and SQL injection on analysis. SQL injection attacks Some form of hacker attacks, of which 60% may be subject to SQL injection attacks. IT security and control firms and Internet Crime Complaint Center have issued a report that this year the number of SQL injection attacks is rising, especially related to financial services and online retail website. SQL injection attack will be the top of six network security threats. Therefore, defense against SQL injection attacks is very important. SQL injection is a way to insert or add the SQL code to the application (user) input parameters in the attack, then after passing these parameters to the backend SQL Server to parse and execute. From the normal port access, the surface is no different from ordinary access (firewalls are generally not aware of it), the use of procedural legitimacy
of the user input data are not strictly testing or the characteristics of, from the client to submit SQL code to collect information in the database, and then obtain the site administrator's account and password. SQL injection attack steps: The use of special SQL statements looking for injection vulnerability; Using injection vulnerability repeated attempts to obtain background information on the database; Analysis of the database information, laying the groundwork for further attacks. Common SQL injection attacks are generated by constructing a dynamic string, the following describes severalsituations: Mishandling of the escape character Escape character in the database which has a special meaning. For example: single quotes ('), space ( ), double vertical bar (||), comma (,), dots (.) and double quotes ("), etc. Now take the single quotes as an example. SQL database resolve single quotes between code and data separator. Therefore, only the URL or Web page (or application) of the field, enter a single quotes, you can identify the Web site is subject to SQL injection attacks. Here is a user input is passed directly to the dynamically created SQL statement: $result = mysql_query ($SQL); $rowcout = mysql_num_rows ($result); $row=1; While ($db_field=mysql_fetch_assoc ($result)) { If ($row<=$rowcount) { Print $db_field [$row] . ; $row++; } }
Figure 1. Mishandling of the escape character
implementation of compulsory type, there will be an SQL injection. For example: statement: = "SELECT * FROM data WHERE id =" + variable +";". Here, variable is a hope that with the "id" field in the figures. If, when used a number field in a SQL statement, if the programmer did not check the legitimacy of user input, such attacks occur. Query set not handled properly Sometimes necessary to use dynamic SQL statements on the application of some complex code, because the program development stage may not know the table or query field. Here is a simple example; it will pass user input directly to the dynamically created SQL statement. $SQL=SELECT $_GET [column1], $_GET [column2], $_GET [column3], FROM $_GET [table]; $result=mysql_query ($SQL); $rowcount=mysql_num_rows ($result); $row=1; While ($db_field=mysql_fetch_assoc ($result)) { If ($row<=$rowcount) { Print $db_field [$row] . ; $row++; } } Figure 2. Query set not handled properly Improper Error Handling Improper Error Handling will give the system a security problem. The most common problem is the detailed internal error message is displayed to the attacker. These details will be provided for the attack potential pitfalls associated with the system, an important clue. For example: An attacker could use this information to extract how to modify or construct injection to avoid
Type of improper handling If a user supplied field no verifies the validity or
the developers query, and get how to manipulate the database. Here is an example. Select a user identifier from the drop-down list, the script generates a dynamic SQL statement: Private void SelectedIndexChanged (object sender, System.EventArgs e) { String SQL; SQL=SELECT * FROM table ; SQL+=WHEREID=+UserList.SelectedIte m. Value+; OleDbConnection con =new OleDbConnection(connectionString); OleDbCommand cmd =new OleDbCommand (SQL, con); try { Con.Open (); Reader=cmd.ExecuteReader (); Reader.Read(); lblResults.Text=+reader [LastName]; lblResults.Text+=,+reader [FirstName]+ ; lblResults.Text+=ID:+reader [ID] + ; reader.Close(); } catch (Exception err) { lblResults.Text=Error getting data. ; lblResults.Text+=err.Message; } finally { Con.Close (); } } Figure 3. Improper error handling Mishandling of multiple submission White list [2] is a character in addition to the list of outside,the other characters against the technology used. There are any types of attacks which can be presented in various ways, to effectively maintain such a list is a very arduous task. Use the list of
unacceptable characters is to define the potential risk is likely to ignore a list of unacceptable characters. Developers often used to think that users will follow the logical order of their designed flow operation. For example: When a user has reached a second form in the form, they will expect the user must have completed the first form. But in fact, out of order with a direct URL to request resources can be very easy to avoid the expected data flow. B. Database explosion attacks We are looking into the purpose of vulnerability, which is the desired database, such as: user name, password, etc., can also take this further to obtain permission. If we do not have to get into the database, not better? Thus, database explosion has become a much simpler means of invasion. Database explosion is also a favorite and effective effort invasion tactics for the invaders. Database explosion is through some technical means or bugs get the address of the database, and illegal downloading of data to the local. Hackers are very happy in this work. Because when hackers get site database, site management accounts can be obtained on site destruction and management, but also through the database by website user private information, even by the server's highest authority. For database explosion, there are many ways to be downloaded on the network database. The main: %5c database explosion, conn.asp database explosion. %5c database explosion % 5c database explosion is open a Web page, the URL ddress in the "/" into "% 5c", and then submitted to the server, you can get the path of the database. In fact, not all URLs are valid only with the behavior of alls to the database, the web site. For example: "asp? id =". Just make sure that the page has call database behavior, to explosive library can be implemented% 5c. % 5c database explosion, is not the page itself vulnerability,
but to use the IIS decoding a feature .If the IIS security settings are not considered comprehensive, and developers did not consider the case of IIS the error, it will be exploited by attackers. Why do we use% 5c? It is the "\" hex code, but the author "\"and "% 5c" will produce different results. % 5c database explosion is the URL "/"into "% 5c", and then need to consider a problem. What is the location in which"/"into "% 5c" it? This site is a few levels need to see the directory. General website directory sites are second or third more, there is% 5c libraries are more likely to burst. For example: http://hxhack.com/soft/view.asp?id=58. Here it one of the second "/"into "% 5c". So, which is http://hxhack.com/soft% 5cview.asp? id = 58. conn.asp database explosion conn.asp database explosion is the first explosion occurs. Database calls conn.asp here represents documents, mostly the name. Even though some sites will be renamed, we also discussed as conn.asp. After "% 5c" database explosion appears, conn.asp database explosion begins rarely mentioned. I think that,% 5c database explosion server security database with the strengthening of the role will be less and less. Instead the role of conn.asp database explosion greater burst can be man-made structure. If% 5c database explosion is to use an absolute blast library path error, then conn.asp database explosion is to use a relative path to database error. Generally speaking, as long as the system conn.asp not at the root, and call the file in the root directory, there will be problems. Accurate to say that, conn.asp and call it the relative position of the file if changed, would be an error, burst out of the database path. C. Unsafe configuration database These are just for the security code, but can not ignore the security of the database itself. We can use many methods that can be
stolen or manipulated to reduce the data access level, and SQL injection attacks caused damage. Now the mainstream databases Oracle, MySQL, and SQL Server and so on. They are the default user with a lot of preinstalled content. For example: SQL Server database system administrator account "sa"; MySQL's "root" user account; Oracle's "SYS", "SYSTEM", "DBSNMP" account and so on. These accounts by default pre-set passwords are often known. Different types of database servers also added its own access control model; a variety of permissions for the user account is assigned to different operations. But also default support exceeds the demand and the ability to modify the function of the attacked (xp_cmdshell, LOAD_FILE, OPENROWSET, ActiveX, etc.). Developers write programs, the general permission to use a built-in account to connect to the database, rather than the program needs to create a specific account. So that these powerful built-in account on the database needs to perform many independent operations and procedures. When the attacker to use SQL injection vulnerability and the authorized account connect to the database is to be used in the database privileges of the account code. III. PREVENT SQL INJECTION ATTACKS AND DATABASE EXPLOSION ATTACKS At present, computer network database for SQL injection attacks and database explosion attacks, there are security precautions following methods: A. User input validity check Review the use of the user, mainly through the URL submission check client existence of the parameters "and, or ,`,",:,;, exec, select, from, insert, delete, update, count, user, xp_cmdshell , Add, net, drop, table, truncate, mid ",%,, for SQL injection and other common characters, or strings, but also limit the length of user
input. Examples of special characters check the function as follows: public bool inputCheck (string str) { string str1=@ select|insert|delete |from | count\ (|drop table | update Truncate | asc\ (mid\ ( |char\ ( |xp_cmdshell | exec master | netlocalgroup administrators || : | net user | | or | and : String str2=@ [-|;|,|\/|$|$|\[|\]|\ }|\{|%|@|\*|!|\} ; bool result=!Regex. IsMatch(str, str1, Regex-Options. IgnoreCase) | | Regex. IsMatch (str, str2) ); return result: }
Figure 4. User input validity check
B. Application of storing process Application designed specifically to use stored procedures to access the database is a SQL injection can prevent or mitigate the impact of the design. A stored procedure is stored in the database program. Depending on the database, you can use many different languages and variations to write stored procedures. Storage process is very helpful to reduce the potential serious impact of SQL injection vulnerabilities, because in most of the database using stored procedure can be configured at the database level access control. This is important, which means if we find the available SQL injection problems, you can configure through the proper licensing to ensure that the attacker can not access sensitive information in the database. C. Encapsulate the client to submit information This approach needs the support of RDBMS, Oracle currently only using the technology. D. Database file extensions to ASP, ASA Traditional preventive measures, most users prefer the suffix of the database into the ASP MDB before or ASA. Although in this
way to prevent database explosion, with the continuous development of computer technology, similar to those of traditional methods can not meet the requirements of the latest prevention. Modified for ASP or suffix after the ASA database files, hackers can determine the storage location by looking, you can quickly download tools such as combination of Thunder download available. E. Add "# " in front of the database name Currently, many database administrators to add the # sign in front to avoid the database is downloaded; this is because IE can not download the file with a # sign. But the Web can be used in addition to regular access; it can also be combined with coding techniques to access IE. Each of the different characters in IE in all relevant codes, encoding binary% 23 to replace the # sign, followed by treatment in this manner, even if the suffix with the # same database file can be download. Such as: # data.mdb file downloads, simply enter% 23data.mdb browser is to download the database file through IE, which is for the # symbol can not play the role of defensive measures. F. To the database user password encryption User passwords and other sensitive information encryption, such as the use of MD5 encryption, that is, ciphertext = MD5 (clear). MD5 can not be decrypted by the attacker even if they know there is encrypted in the database the same password as the garbled; he has no way of knowing the original password. It should be stressed that once the user is lost or forgotten password is difficult to retrieve. G. Hidden back entrance Taking into account the background database management often will take a different way of operation and maintenance, will not add fortified SQL injection code, nor will it set the appropriate admin page the link points. In order to avoid background landing page is scanned, programmers need to be adjusted back-catalog management, which best can not
be set to: manage, admin and so easy to break words, usually set into a non-word letters, numbers, combinations of various Form more complex combinations as possible. The name of the login page, try not to set to: login.asp and other names, so as not to be cracked. H. Establish reasonable error return policy SQL error message as often disclose certain details of database design, so the application is running SQL error occurs, do not put the database returns the error message displayed to the user without discrimination. Reasonable approach is to package SQL error pages, according to whether the user has debug privileges, decided to show all error messages to the user or only prompt SQL runtime error. IV. INDUSTRIAL STANDARDS The efficacy and validity of industrial best practices for safeguarding standalone computers, smart devices, information resources, computer networks and their peripherals are not frequently evaluated in the domain of computer network security and network forensics. The main reason is that industrial best practices are mostly propounded by globally recognized body of experts such as American National Standards Institute (ANSI), National Institute of Standards and Technology (NIST), British Standards Institute (BSI), Standard Organization of Nigeria (SON), Institute of Electrical and Electronics Engineers (IEEE), International Standards Organization (ISO) and Information Systems Audit and Control ssociation (ISACA) [14, 19]. While industrial standards are constantly being updated and new disposable devices are being manufactured from time to time, we are unsure whether the new versions of each standard is sufficient enough to adequately provide the necessary guidelines that network forensics experts and IS auditors would use to effectively discharge their duties. A. Outsourcing of IT operations
Outsourcing of IT functions is becoming the best IT practices in the industry. The rate of change in customer requirements, natural disasters and computer crimes across the globe are also on the increase [12-14]. IT auditors must continually evaluate third party applications, selection of vendors; profiles of access assigned to vendors and Business Contingency Planning (BCP) procedures of their organizations to ensure strict compliance with best practices and to lessen downtime. There is a growing rate of complexity on how to actually conduct purposeful Business Impact Analysis (BIA) and penetration testing across multiple outsourced IT operations in order to anticipate potential vulnerabilities of computers and cloud resources that can be exploited by malicious users and hackers. There are numerous unidentified stealthy attacks and techniques to evade detection that may not be covered by the present industrial best practices known to the BIA team. Consequently, current best industrial practices and the results of most BIA are highly subjective. B. Redundant intrusions and redundant alerts IDSs fundamentally generate numerous alerts while in operation to detect potential attacks . From that fact, conceptual discussions of redundant intrusions and redundant alerts can generate controversies in some cases. asically, redundant intrusions such as in Figure 1 below are similar intrusions that are reoccurring over time. Statistically, some alerts are related, some are partially related and others may not
correlate
altogether.
Figure 1 Alerts from redundant intrusions
In other words, it is difficult to determine how closely two alerts triggered by IDS co-vary. It is also difficult to separate alerts that form perfect negative correlation, no correlation or those that form perfect positive correlation. Human factor in analyzing intrusion logs is relevant in the context of administration of IDS especially whenever attackers launch attacks that cause NIDS to trigger alerts with the aim of exhausting capabilities of the analysts . The issue here is that some analysts might not notify their management or their strategic managers on the need to deploy additional hands to cooperatively analyze swamped alerts for the fear of being fired. Accordingly, many attacks go unnoticed despite series of warnings from the IDS in use. Several factors can cause alerts swamping and intrusion redundancy.
computer networks. The basic fact is that an attack on a standalone computer system has the same source and destination address unlike an attack that involves computers networks such as in Figure 3. Essentially, Figure 3 shows one IDS (Q) that mediates between two computer networks A and B and another IDS (M) that is installed on a computer in network B. Figure 4 illustrates the complexity in the installation of IDSs shown in Figure 3. Figure 4 therefore shows an attack that originates from a computer machine or a device (S) towards another computer system or a device (T). There are IDS in the target system (T) and another IDS (NIDS) that mediates between the source of the attack and its destination. Users often deploy multiple IDSs to maximally detect network intrusions. Homogeneous, hetero-geneous or semi-heterogeneous IDSs can be deployed in this regards. Homogeneous IDSs are similar IDSs such as in the deployment of Snort in multiple segments in an organisation.
Figure 3 Attack across different computer networks
Heterogeneous IDSs comprise of different categories of IDSs. It may signify the deployment of NIDS (such as Snort for monitoring computer networks) and hostbased IDS (such as OSSEC for monitoring the integrity of internal components of computing systems in an organisation).
Figure 2 Attack on a standalone computer system
Figures 2 to 4 illustrate that configurations of a standalone computer system can completely different from configurations of computers in a set of Organized by: Department of Computer Science and Engineering , Anand Institute of Higher Technology, Chennai. www.iconic12.in E-mail: iconic.aiht@gmail.com Page 147
Figure 4 Attack within the same computer networks
Aiming at SQL Attack. North China Electric Power University,2010.
However, in an attempt to rigorously safeguard computer systems and cloud resources from intruders, design flaws are indirectly built into them. Security of the database for any system, are crucial. In response to these attacks on the database, we will have to be considered during the development phase, in the code and database configuration on the good aspects of a comprehensive prevention. This is not only the beginning for the system to improve safety and reduce the risk of much safety, security maintenance for the system post-human, material and financial resources to save significant costs.
V. CONCLUSION
REFERENCES
[1] WebCohort. WebCohort`s application defense center reports results of vulnerability testing on Web applications [EB/OL].March 2004. http://www.imperva.com/company/news/2004 feb 02.html. [2] Xiaolei Huang. SQL injection attacks and defense. Tsinghua University Press, 2010. [3] Biao Meng, Junjing Liu. SQL injection attacks classified defense model [J]. Information Technology & Standardization, 2008. [4] Jian Zou. layman's language and SQL SERVER 2005 [M]. People Post Press,2008. [5] YaoJiang Zhang. Focus hacking: attacks and protection policies. Posts Press,2002. [6] Lianggen Zhu, Zhenjia Lei. Database Security Technology [J].Computer Applications,2004. [7] Liwu Deng. Design and Research on Database Protection System Organized by: Department of Computer Science and Engineering , Anand Institute of Higher Technology, Chennai. www.iconic12.in E-mail: iconic.aiht@gmail.com Page 148
SEAMLESS NETWORK CONNECTIVITY IN VEHICULAR AD HOC NETWORK

1,2
M. E. Computer Science and Engineering, Anna University of Technology KCG College of Technology, Chennai.
Manimegalai D.1, S. Cloudin2
Abstract
Vehicular Ad hoc Networks (VANETs) attract more and more attentions both from academia and industry. Nowadays, the navigation systems available on cars are becoming more and more sophisticated. They greatly improve the experience of drivers and passengers by enabling them to receive map and traffic updates, news feeds, advertisements, media files, etc. Although it has achieved much success in real time, but one of the issues faced in vehicular Ad hoc Network is connectivity. Connectivity is a fundamental requirement in the planning, design and evaluation of vehicular ad hoc network (VANET). Connectivity in VANET is often not available due to quick topology network changes. The scope of the work is to improve the behaviour of the connectivity dynamics in one dimensional and 2 dimensional network topology without any loss information. This work consists of more than one parameter which gives the accurate result in the network. Connectivity measured through packet delivery ratio, throughput and signal strength with respect to timing. As vehicular ad hoc network applies in real time environment, the effective connectivity relies on successive dissemination of time critical information to all vehicles. Keywords Connectivity, signal to noise range, transmission range. I INTRODUCTION
Vehicular Ad hoc Networks (VANETs) are a special type of Mobile Ad hoc Networks (MANETs), which are built up from vehicles. It is a technology that uses moving cars as nodes in a network to create a mobile network. VANET turns every participating car into a wireless router or node, allowing cars approximately 100 to 300 meters of each other to connect and, in turn, create a network with a wide range. As cars fall out of the signal range are drop out of the network, other cars can join in, connecting vehicles to one another. This technology shall be used in police and fire vehicles to communicate with each other for safety purposes. The potential applications of VANETS include safety-related applications such as cooperative forward collision warning system, traffic signal violation warning, lane change warning and information/entertainment applications for back seat passengers. All these applications rely on successful dissemination of time-critical information to all vehicles on a road segment. Hence network connectivity is a fundamental requirement in VANETs, i.e. all vehicles on a road segment should be able to communicate with each
other either directly or via multiple hops between intermediate vehicles. Network connectivity is a fundamental performance measure of ad hoc and sensor networks. Two nodes in a network are connected if they can exchange information with each other, either directly or indirectly. For VANETs, the connectivity is very important as a measure to ensure reliable dissemination of time-critical information to all vehicles in the network. Further, the connectivity of a VANET is directly related to the density of vehicles on the road and their speed distribution. In conventional Ad hoc network, a VANET deal with different type of network densities. For example VANET in urban area form highly dense network during rush traffic and form sparse network in late night hours. If the vehicle density is very high, a VANET would almost be connected. The connectivity degrades, when the vehicle density is very low, and in this case, it might not be possible to transfer messages to other vehicles because of disconnections. This paper, investigate the connectivity properties of vehicular ad hoc network in the presence of parameter such as signal to noise ratio (SNR), transmission range and velocity profile such as speed, distance and time. In this work the transmission range of each vehicle is not deterministic quantities but they are randomly generated based on transmission power and receive power threshold. The following metrics are used to determine the connectivity , 1) Signal To Noise Ratio: Signal-tonoise ratio is defined as the power ratio between a signal (meaningful information) and the background noise (unwanted signal).
2) Throughput: Throughput is the average rate of successful message delivery over a communication channel. 3) Packet delivery ratio : The ratio of total data packets successfully received to total ones sent. 4) Packet loss ratio : Packet loss occurs when one or more packets of data travelling across a computer network fail to reach their destination. 5) Connectivity : connectivity in the entire system measured with signal to noise ratio and transmission range.
II RELATED WORK
The study of vehicular ad hoc network (VANET) in [1] has introduced the new equivalent speed parameters that are calculated on the basis of vehicle mobility. The connectivity is analysed in free flow state in highway segment in VANET. It has proved that the equivalent speed is different from the average vehicle speed and it decreases as the standard deviation of the vehicle speed increases. Using the equivalent speed the average number of vehicle in the highway segment is obtained. The result of this paper [1] shows that increasing the average vehicle speed increases the equivalent speed, which leads to a decrease in the average number of vehicles on a highway segment and consequently degrades connectivity. On increasing the standard deviation of the vehicle speed decrease the equivalent speed that increases the average number of vehicles in the highway segment that consequently improves the connectivity. It also show that vehicles in a VANET can adaptively choose
their transmission range to ensure network connectivity in highway segments while minimising power consumption. The main limitation of the work in [1] is that connectivity is degraded in high mobility and unpredictable arrival of vehicles. Second the study of vehicular ad hoc network connectivity of work in [2] based on the performance of the connectivity in dense and sparse vehicle and identifies bottle neck in low connectivity. Two models are used in this paper [2] are Bollabas bond model and Percolation bond model. It has said that quantitative relationship exists among network connectivity, vehicle density and transmission range. The two models such as Bollabas and Percolation bond model are used based on transmission range parameter,
1) If
Third the study of vehicular ad hoc network in stochastic connectivity of [4] deals with the stochastic parameter for vehicular connectivity. The problem defined in work [4] is that the nodes are uniformly distributed with the constant speed. Vehicle arrives according to poisson process with only single parameter. Poisson based connectivity of the vehicular network is not appropriate in the single variable to ensure a strong connectivity. Connectivity of work in[4] is not appropriate with single variable to ensure a strong connectivity. The poisson based approach lack in realistic traffic model. In our paper, a proposed model has Kconnectivity within the road topology are investigated based on the parameter used.If the connectivity are stable throughout network, the information can be exchanged.. The model consists of stochastic parameter such as signal to noise ratio, velocity profile, and transmission range. The stable connectivity is maintained in the entire system. Consider network topology where vehicles are distributed from left to right. Vehicles arrive at first and keep moving without joining or leaving. This work mainly deals with the important two layers where 1. Physical layer and 2. Maclayer. Physical layers called as lower layer where the bits are transmitted. Mac layer which uses IEEE 802.11p that is used for wireless channel communication systems. The other layer which is higher layers, for the purpose of applications as user interface in the system. The data is passed from the Mac layer to physical layer through a selective channel with higher constant bit rate. Packets are passed from source to destination through DSDV routing protocol where the bandwidth are used
transmission range is small, Percolation bond model is used to discover network connectivity. 2) If transmission range is large, Bollabas bond model is used to discover network connectivity. The network connectivity is calculated based on the given vehicle density and minimum transmission range. The advantage of this work is that with 1) Small transmission range is enough to obtain good network connectivity. 2) It requires less power and save energy The limitation of this work is, 1) Wastage of energy when using large transmission also brings more collisions. 2) Vehicle arrivals are unpredictable.
effectively. The nodes are in mobile state and the parameter used her Signal to Noise ratio, Transmission range, Speed, Time and Distance.
III.SYSTEM MODEL
Higher Layers
SNR is directly proportional to transmission range If SNR is high I. Signal strength is high. If SNR is low II. Signal strength is low. Power Monitor which calculate the current status of the energy in the node. The power monitor checks the current signal strength value with previous signal strength of the each node. If the current power level is higher than the previous power level, then the current power level is set as SINR value to the node. TABLE 1 Example of Election of SNR
802.11p MAC layer
Node State Set
Neighbour Detection
Traffic Validation
Parameter
Current power level
Previous power level
Election of signal
Select SNR
Transmission Range
Signal to noise
10 db
9 db
10 db
B. TRANSMISSION RANGE
Physical Layer
Figure1. System Architecture A. Election of Signal to Noise Ratio It is defined as the ratio of signal power to the noise power. SNR = Psignal/Pnoise average power. Where P is the
Radio link model is assumed in each vehicle has a transmission range(r). It has two way communication systems. Signal is directly proportional to transmission range. 1. Two vehicles are able to communicate directly, if the distance between them is less than transmission range(r). 2. Two vehicles are not able to communicate directly, if the distance between them is greater than transmission range (r).
Based on the SNR value, the transmission range of the node is identified. Since the SNR is directly proportional to the transmission range, this transmission range is updated in the each node. The parameters to calculate transmission range are 1. 2. 3. 4. 5. Distance Speed Time Power Threshold and signal to noise ratio
Invalid Transmission range = 0 means node are disconnected b) Detect Neighbour Node This method is used to detect the available peers in the each nodes transmission range. Neighbor detection is required to change the connectivity to the next node if the connectivity to the current node is lost. If no neighbor node is detected the node update its node state as disconnected. The updating of neighbour node in the mean of when new nodes are detected as neighbour node it update the velocity profile such as speed, distance, time and number of peer node. If the node doesnt find its neighbour node then it is disconnected from the network. The node keep on updating its neighbour node as it is in mobility state and find its signal strength on each movement. c) Node State Set This work is used to update the node state in 1-D vehicular network. It updates either connected state or disconnected state. The node transition decision shall be taken from this module based on the SNR, traffic. Each node velocity profile shall be update and maintained in this module. Node State = 1 (the node is in connected) Node State= 0 (the node is in disconnected) D) CONTICONTENTION CONSTANT TIME
C. TRAFFIC PATTERN VALIDATION This proposed work is used to generate and validate the traffic pattern. Since at the init stage of the network behaviour all the nodes are in same (x + y) where x is the node number and y is the distance in km. This traffic generation is to model the poorly connected network by making stop and go mechanism. The main steps to validate traffic pattern are based on the connectivity validation on each node. Traffic validation characterized by: a) Connectivity validation Transmission range are calculated and analysed with signal to noise ratio, receiver power and speed. If the calculated value is valid then the node is connected else its disconnected from the network. Here Tr is transmission range ,Rx receiver threshold , Cx collision threshold. If Tr<(Rx & Cx) Valid Transmission range = 1 means node are connected If Tr> (Rx & Cx)
The traffic congestion shall be more in the 2D approach. CONTI is used to avoid the congestion in the network. The contention of n nodes is resolved using CONTI over k contention slots. Each of the stations uses the same probability vector p. a node chooses signal 1 with probability pi or signal 0 with probability 1-pi. a) Send Jam Signal A node with the highest probability attains the medium If medium is not idle. 1. Send jam signal to all nodes. 2. Discard the packet and send MAC busy tone. The nodes which are listening and hearing jam signals are called to preemption state. b) Preempt Nodes The nodes that dont attain the medium are preempted. The node will check for control signal such as RTS, CTS, TX. The control signal will be stopped until it attempts to get the medium. c) Calculating Probability Based on the Tx Count the probabilities are assigned. a) If (tx count=1), p= 0.2523 b) If (tx count=2), p= 0.36715 c) If (tx count=3), p= 0.4245 d) If (tx count=4), p= 0.4314 e) If (tx count=5), p= 0.5 where TX is the transmission count in the node
d) CONTI-Validation Based on the assigned probability the Conti is validated.If the probability value less than the assigned value, then the signal set to 1 else set to zero. i) Signal=1 a node send data and busy tone is set. ii) Signal=0 a node cannot send the data. IV SIMULATION RESULTS The simulations are carried out in Network Simulator with Cygwin Unix emulator. The scenarios used are one-way highway road and city square road. The total simulation time taken are 480 seconds and the results are collected after a period of 50 seconds to allow the vehicular ad hoc network to achieve the steady state. TABLE 2 Simulation parameters and values
Number 1. 2. 3. 4. 5. 6. 7.
Parameter Number of mobile nodes 15
Value
Data and control packet 11 MB rate Topology Antenna Transmit power Packet size Two way ground Flat grid Omni antenna 0.0316227 1500 bytes Radio propagation
This work consider the vehicle i.e. nodes are moving or in mobility. In this paper, 15 nodes are taken and placed in different positions with flat grid topology. Continuous Bit Rate (CBR) traffic sources are used for traffic model. The MAC here used employ is 802.11 MAC. MOBILITY PARAMETER TABLE 3 Mobility Parameters and Values Number 1. 2. 3. Mobile parameters DSDV routing MAC type Propagation Value RP 802.11p Radio waves Figure 2 Signal Noise Ratio (decibel) versus Time(seconds) The figure shows that signal strength increases at initially with high decibel and moderately high in the entire system. The graph drawn on the basis of signal to noise ratio with respect to timing . signal strength dropped at 200th seconds and again it maintain the signal strength constantly in the entire system. Finally the graph shows that with this average signal strength the nodes can be connected and information can be exchanged efficiently. 2) Throughput: Throughput is the average rate of successful message delivery over a communication channel. Throughput = (number of packet received/number of packet sent)*100
Three performance metrics, are analysed on comparing one dimensional and two dimensional network connectivity. 1) Signal To Noise Ratio: Signal-to-noise ratio is defined as the power ratio between a signal (meaningful information) and the background noise (unwanted signal). SNR = Psignal/Pnoise average power. Where P is the
Figure 3 Average Throughput (%) versus Time (minute) The graph shows that average throughput are analysed from the 50th seconds of the animation starts. At some period of time the throughput in the entire system is increased .In one dimensional network the highest throughput is 65 percentage and in two dimensional network the throughput is above 85 percentage. 3) Packet Delivery Ratio : It is defined as the ratio of total number of packets that have reached the destination node to the total number of packets originated at the source node. N Where is the channel capacity and N is the number of nodes.
Figure 4 packet delivery ratio(%) versus Time (minute) The above graph shows that the packet delivery ratio is increased moderately with high percentage. This average packet delivery ratio ensures that with stable connectivity the packet are transmitted and received. Each instance of time the packet delivery ratio is calculated from each node. 4) Packet loss ratio %: The ratio of number of packet lost to the number of packet successfully received and the lost number of packet. PLR= [ Lp / ( Sp + Lp )]x100 Where PLR is packet loss ratio, Lp is lost number of packet, Sp is received packet. The figure 4 shows that packet loss ratio is less on using of performance parameter. Initially the packet loss is high but after sometimes it falls into moderate. The factors affecting packet loss includes signal degradation, packet drop because of channel congestion , corrupted packets rejected in-transit, faulty networking hardware, faulty network drivers or normal routing routines
However, most of the existing studies on node connectivity have the implicit assumption that nodes are distributed homogeneously in the geographic area, which is inappropriate in VANETs and may lead to erroneous results. This work emphasizes on stable connectivity in the vehicular ad hoc network to exchange time critical information without loss of information. By applying parameters such as signal to noise ratio, transmission range and velocity profile to the system, the connectivity and throughput in the network is improved. In this work, the stable connectivity used to avoid the traffic congestion by providing forward collision warning message, traffic signal violation warning message, lane change warning message. The signal to noise ratio is used for identifying signal strength of the vehicle and made communication with its neighbour vehicle for effective transformation of data. At each movement of vehicles, the parameters are updated. The transmission range provides communication between vehicles. The above parameter improves the connectivity and throughput of the system. As future work, roads with multiple lanes, bidirectional traffic, and more complicated urban road topology can be developed by superposing multiple versions of urban routes.
Figure 4 packet loss (%) versus Time (minute) 5) Vehicle Connectivity Performance:
Figure 5.Vehicle connectivity versus Time The above graph shows the average connectivity on the network where each node are in the mobility. The graphs are plotted based on the time. Connectivity on the entire system are validated and analyzed. The stable connectivity is maintained in the system. The data are transmitted to the destination with less number of packet losses.
REFERENCES
[1] Salman Durrani, Xiangyun Zhou and Abhas, CECS, Australia.(2010) Effect Of Vehicle Mobility On Connectivity Of Vehicular Ad Hoc Network. Chandra School of Engineering, The Australian National University, Canberra , Vol. 12,No.3. [2] Xin Jin, Weijie Su, YAN Wei,(2010)Quantitative Analysis of the
V. CONCLUSION
Vehicles in the road networks do not distribute homogeneously of traffic lights.
VANET Connectivity: Theory and Application, [3] Xin Jin, Weijie Su, YAN Wei,(2009) the study of Vanet Connectivity by Percolation Theory. [4] Ivan Wang-Hei Ho, Member, IEEE, Kin K. Leung, Fellow, IEEE, and John W. Polak , (2011). Stochastic Model and Connectivity Dynamics for VANETs in Signalized Road Systems Ieee/Acm Transactions On Networking, Vol. 19, No. 1.Ivan W. H. Ho, K. K. Leung, J. W. Polak, (2009) Connectivity Dynamics for Vehicular Ad-hoc Networks in Signalized Road Systems, the 21stInternational Teletraffic Congress (ITC) [5] Shigeo Shioda, Junko Harada, Yuta Watanabe (2010) Fundamental
Characteristics of Connectivity in Ad Hoc Networks,. Graduate School of Engineering, Chiba University, 1-33 Yayoi, Inage, Chiba 263-8522, Japan
A Survey of QoS Issues and On Demand Routing Protocols Performance Analysis in Mobile Ad hoc Networks
1
P.Sivakumar1 and Dr. K.Duraiswamy2 Associate Professor, Manakula Vinayagar Institute of Technology, Puducherry 2 Dean, K.S.R College of Technology, Tiruchengode
Abstract
Mobile ad hoc network (MANET) is a collection of mobile nodes with no preestablished infrastructure, forming the temporary networks. The dynamic topology of a mobile ad hoc network poses a real challenge in the design of a MANET routing protocol. Over the last decade, a variety of routing protocols have been developed for real time multimedia communications and their performance simulations are made by researchers. An efficient approach is to consider routing algorithms in which network connectivity is determined in the process of establishing routes. The shortest path from a source to a destination in a static network is usually the optimal route; this idea is not easily extended to MANET. Factors such as power expended, variable wireless link quality, propagation path loss, fading, multi-user interference, and topological changes, become relevant issues. The network should be able to adaptively alter routing paths to alleviate any of these effects. Hence, Performance is an interesting issue for different protocols. This paper describes some special characteristics of ad hoc on-demand routing protocols, with their working and performance measurements of these protocols and also focuses on key challenges in provisioning predetermined levels of such Quality of Service (QoS). Evolving functional areas where performance and QoS provisioning may be applied are also identified and some suggestions are provided for further research in this area. In this paper we have attempted to provide such an overview.
Keywords Mobile ad hoc network, Quality of Service, QoS provisioning, On demand

routing protocol.
II.
INTRODUCTION
A. On demand routing protocols
Mobile Ad-hoc Network (MANET) is a network where autonomous mobile nodes with wireless interfaces construct a temporary wireless network. In mobile ad hoc networks there are no dedicated routers. Each node operates as a router and transmits packets
between source and destination. The node within the transmission range of the source node and is not the destination node, accepts the packet sent by the source and forwards it along the route to the destination node. A number of MANET routing protocols have been proposed in the last decade. These protocols can be classified
according to the routing strategy that they follow to discover route to the destination. These protocols perform variously depending on type of traffic, number of nodes, rate of mobility, etc. Over the last decade, various MANET routing protocols have been developed by network researchers and designers primarily to improve the MANET performance with respect to establishing correct and efficient routes between a pair of nodes for packet delivery [1]. Examples of popular MANET routing protocols are: Ad Hoc On-Demand Distance Vector (AODV) [5], Dynamic Source Routing (DSR) [6]. Conventional routing protocols such as AODV and DSR use minimum hop count or shortest path as the main metric for path selection. However, networks that require high Quality of Service (QoS) needs to consider several criterias that could affect the quality of the chosen path in packet forwarding process [4]. Limited resources in MANETs made a very challenging problem that is represented in designing of an efficient and reliable routing strategy [2]. Transferring real-time traffic over MANETs is a big challenge due to the high requirements of bandwidth, time delay, and latency for such traffic [3]. This requires the offering of guaranteed service quality.
B. QoS issues in MANET
maximum variance in delay and maximum rate of packet loss. After accepting a service request from the user, the network is expected to ensure the committed service requirements of the users throughout the communication. QoS provisioning is challenging due to the key characteristics of MANETs i.e. lack of central coordination, mobility of hosts and limited availability of resources. QoS routing focuses on identification of network paths with sufficient capacity to meet end-user service requirements. The rest of this paper is organized as follows. Section II briefly describes the most important on-demand routing protocols and comparative study of AODV and DSR protocols. Section III discusses QoS Support in MANETs Section IV presents QoS provisioning challenges in MANETs. Section V represents QoS protocol performance issue. Section VI represents a conclusion of the paper.
SOME IMPORTANT ONDEMAND ROUTING PROTOCOLS

III.
QoS is the performance level of a service offered by the network to the user. The primary goal of QoS provisioning is to achieve more deterministic behavior by proper utilization of the network resources. A network or a service provider can offer different kinds of services to the users based on a set of service requirements such as minimum bandwidth, maximum delay,
AODV Protocol The Ad hoc On Demand Distance Vector (AODV) routing algorithm is a routing protocol designed for ad hoc mobile networks. AODV is a modification of the DSDV algorithm. AODV is capable of both unicast and multicast routing. It is an on demand algorithm, meaning that it builds routes between nodes only as desired by source nodes. It maintains these routes as long as they are needed by the sources. Additionally, AODV forms trees which connect multicast group members. The trees are composed of the group members and the nodes needed to connect the members. AODV uses sequence numbers to ensure the freshness of routes. It is
A.
loop-free, self-starting, and scales to large numbers of mobile nodes. AODV builds routes using a route request / route reply query cycle. 1) Route Discovery: When a source node desires a route to a destination for which it does not already have a route, it broadcasts a route request (RREQ) packet the network. Nodes receiving this packet update their information for the source node and set up backwards pointers to the source node in the route tables. In addition to the source node's IP address, current sequence number, and broadcast ID, the RREQ also contains the most recent sequence number for the destination of which the source node is aware. A node receiving the RREQ may send a route reply (RREP) if it is either the destination or if it has a route to the destination with corresponding sequence number greater than or equal to that contained in the RREQ. If this is the case, it unicasts a RREP back to the source. Otherwise, it rebroadcasts the RREQ. Nodes keep track of the RREQ's source IP address and broadcast ID. If they receive a RREQ which they have already processed, they discard the RREQ and do not forward it. 2) Route Reply: As the RREP propagates back to the source, nodes set up forward pointers to the destination. Once the source node receives the RREP, it may begin to forward data packets to the destination. If the source later receives a RREP containing a greater sequence number or contains the same sequence number with a smaller hop count, it may update its routing information for that destination and begin using the better route. 3) Route Maintenance: As long as the route remains active, it will continue to be maintained. A route is considered active as
long as there are data packets periodically travelling from the source to the destination along that path. Once the source stops sending data packets, the links will time out and eventually be deleted from the intermediate node routing tables. If a link break occurs while the route is active, the node upstream of the break propagates a route error (RERR) message to the source node to inform it of the now unreachable destination(s). After receiving the RERR, if the source node still desires the route, it can reinitiate route discovery.
B. DSR Protocol
The Dynamic Source Routing protocol is a simple and efficient routing protocol designed specifically for use in multi-hop wireless ad hoc networks of mobile nodes. DSR allows the network to be completely self-organizing and self-configuring, without the need for any existing network infrastructure or administration. DSR has been implemented by numerous groups, and deployed on several test beds. Networks using the DSR protocol have been connected to the Internet. DSR can interoperate with Mobile IP, and nodes using Mobile IP and DSR have seamlessly migrated between WLANs, cellular data services, and DSR mobile ad hoc networks. The protocol is composed of the two main mechanisms of "Route Discovery" and "Route Maintenance", which work together to allow nodes to discover and maintain routes to arbitrary destinations in the ad hoc network. 1) Route Discovery: When anode wishes to establish a route, or issues a Route Request to all of its neighbours. Each neighbour rebroadcasts this Request, adding its own address in the header of the packet.
2) Route Maintenance: When the Request is received by the destination or by a node with a route to the destination; a Route Reply is generated and sent back to the sender along with the addresses accumulated in the Request header. The responsibility for assessing the status of a route falls to each node in the route. Each must insure that packets successfully Protocols / parameters Type of Protocol Route maintained in Routing philosophy Loop free Multicast capability Periodic broadcast Multiple route possible Route cache/table expiration timer Require sequence data Route reconfiguration methodology On Demand Routing Protocols AODV Distance vector Route table Flat Yes Yes Yes No Yes Yes DSR Source routing Route cache Flat Yes No No Yes No No
addition, it promiscuously listens to other control messages for additional routing data to add to the cache. DSR has the advantage that no routing tables must be kept to route a given packet, since the entire route is contained in the packet header. The caching of any initiated or overheard routing data can significantly reduce the number of control messages being sent, reducing overhead. Using only triggered updates furthers that same goal. All aspects of the protocol operate entirely on-demand, allowing the routing packet overhead of DSR to scale automatically to only that needed to react to changes in the routes currently in use. The protocol allows multiple routes to any destination and allows each sender to select and control the routes used in routing its packets, for example for use in load balancing or for increased robustness. Other advantages of the DSR protocol include easily guaranteed loop-free routing, support for use in networks containing unidirectional links, use of only "soft state" in routing, and very rapid recovery when routes in the network change. The DSR protocol is designed mainly for mobile ad hoc networks of up to about two hundred nodes, and is designed to work well with even very high rates of mobility. Comparative Study of AODV and DSR Protocols Table I. Comparative study of AODV and DSR protocols This Section describes on-demand routing protocol as well as expose some of the protocols basic characteristics and parameters through tabular study. The preceding is a list of quantitative metrics that can be used to assess the performance of any routing protocol. They are Endto-End delay, End to-End throughput, Routing Acquisition Time, Route latency (delay), Overhead cost
Erase Erase route route and and notify notify source source cross the link to the next node. If it doesnt receive an acknowledgement, it reports the error back to the source, and leaves it to the source to establish a new route. While this process could use up a lot of bandwidth, DSR gives each node a route cache for them to use aggressively to reduce the number of control messages sent. If it has a cache entry for any destination request received, it uses the cached copy rather than forward the request. In
(packets/bandwidth/energy), Percentage out of order Delivery, Packet Delivery Ratio, Remaining Power of node, Effect of node Mobility, Effect of node Density, Effect of packet Length, Effect of Link Stability.
IV.
QOS SUPPORT IN MANETS
From the perspective of the roles that they play in supporting QoS in MANETs, the fundamental building blocks of a QoS architectural framework can be broken down into the following modules. 1. Admission Control Admission control policies are generally tied to ISP or service level agreements between a subscriber and the ISP. They may be additionally based on the availability of adequate network resources in an attempt to meet the performance objectives of a particular service request. Policies may be parameter based, if predefined hard-QoS guarantees are desired; otherwise measurement-based policies are used for softQoS i.e. relative service assurance. Regulation of new traffic to ensure that it does not lead to network overload or service degradation in existing traffic is the primary responsibility of this module. 2. Traffic Classification & Scheduling Scheduling is based on a service rate allocation to classes of traffic that share a common buffer. It is the mechanism that selects a packet for transmission from the packets waiting in the transmission queue. Packet scheduling thus controls bandwidth allocation to different nodes or types of applications. The desired service guarantees are realized independently at each router via proper scheduling. 3. QoS Hard vs Soft State
Maintaining the QOS of adaptive flows in MANETs is one of the most challenging aspects of the QOS framework. Typically, wire line networks have little quality of service or state management where the route and the reservation between sourcedestination pairs remain fixed for the duration of a session. This style of hard-state connection-oriented communications (e.g. virtual circuit) guarantees QoS for the duration of the session holding time. However, these techniques are not flexible enough in MANETs where the paths and reservations need to dynamically respond to topology changes in a timely manner. Therefore, softstate approach to state management at intermediate routing nodes is a suitable approach for the management of reservations in MANETs. It relies on the fact that a source sends data packets long an existing path. If a data packet arrives at a mobile router and no reservation exists, then admission control and resource reservations attempt to establish soft state. Subsequent reception of data packets (associated with the reservation) at that router are used to refresh the existing soft state reservation. This is called a soft-connection when considered on an end-to-end basis and in relation to the virtual circuit hard-state model. When an intermediate node receives a data packet that has an existing reservation, it reconfirms the reservation over the next interval. 4. Buffer Management Buffer management deals with the task of either storing or dropping a packet awaiting transmission. The key mechanisms of buffer management are the backlog controller and the dropper. The backlog controller specifies the time instances when traffic should be dropped, and the dropper specifies
the traffic to be dropped. Buffer management is often associated with congestion control. As an example, consider one of the UDP segments generated by an IP phone application. The UDP segment is encapsulated in an IP datagram. As the datagram wanders through the network, it passes through buffers (i.e. queues) in the routers in order to access outbound links. It is possible that one or more buffers in the route from the sender to receiver is full and cannot admit the IP datagram. In this case, the IP datagram is discarded, never to arrive at the receiving application. Therefore, a mechanism to deal with the packet loss is desired. 5. Resource Reservation Resource reservation is typically performed with a signaling mechanism such as RSVP or INSIGNIA. Using such a mechanism, the network sets aside the required resources on demand for delivering the desired network performance. This is in general closely associated with admission control. Since charges are normally based on the use of reserved resources, resource reservation requires the support of authentication, authorization, accounting and settlement between ISPs. 6. Packet Jitter A crucial component of end-to-end delay is the random queuing delays in the routers. Because of these varying delays within the network, the time from when a packet is generated at the source until it is received at the receiver can fluctuate from packet to packet. This phenomenon is called jitter. 7. End-to-End Delay End-to-End delay is the accumulation of transmission, processing and queuing
delays in routers; propagation delay in the links; and end-system processing delays. For a highly interactive application such as IP phone, end-to-end delays smaller than 150 ms are not perceived by human listeners. Lesser end-to-end delay implies better performance.
V. QOS PROVISIONING CHALLENGES IN MANETS
Several research studies have been conducted on providing QoS support in conventional wireless networks. Such wireless networks often require a fixed wireline backhaul through which mobile hosts can connect to the wire line base stations in a onehop radio transmission. In MANETs no such fixed infrastructure may exist. Thus, providing QoS support in MANETs is more challenging than conventional wireless networks. A summary of the major challenges in providing QoS support in MANETs has been presented in [12]. 1. Dynamic network topology In MANETs there is no restriction on mobility. Thus the network topology changes dynamically causing hosts to have imprecise knowledge of the current status. A QoS session may suffer due to frequent path breaks, thereby requiring re-establishment of new paths. The delay incurred in reestablishing a QoS session may cause some of the packets belonging to that session to miss their delay targets and/or deadlines, which is not acceptable for applications that have stringent QoS requirements. 2. Error-prone wireless channel The wireless radio channel by nature is a broadcast medium. The radio waves suffer from several impairments such as attenuation, thermal noise, interference, shadowing and multi-path fading effects during propagation
through the wireless medium. This makes it difficult to ensure QoS commitments like hard packet delivery ratio or link longevity guarantees. 3. Lack of central coordination Like wireless LAN and cellular network, a MANET does not have central controllers to coordinate the activity of the nodes. A MANET may be set up spontaneously without planning and its members can change dynamically, thus making it difficult to provide any form of centralized control. As a result communications protocols in MANET utilize only locally available state and operate in a distributed manner [14]. This generally increases the overhead and complexity of an algorithm as QoS state information must be disseminated efficiently. 4. Imprecise state information The nodes in a MANET mostly maintain link-specific as well as flow-specific state information. The link-specific state information comprises bandwidth, delay, delay jitter, loss rate, error rate, stability, cost and distance values for each link. The flowspecific information includes session ID, source address, destination address and QoS requirements of the flow. Due to dynamic changes in network topology and channel characteristics, this state information is inherently imprecise. This may result in inaccurate routing decisions resulting in some packets missing their deadlines, leading to violation of real-time QoS commitment. 5. Limited availability of resources Although mobile devices are becoming increasingly powerful and capable, it still holds true that such devices generally have less computational power, less memory
and a limited (battery) power supply, compared to devices such as desktop computers typically employed in wired networks. This factor has a major impact on the provision of QoS assurances, since low memory capacity limits the amount of QoS state that can be stored, necessitating more frequent updates incurring greater overhead. Often QoS routing problems, most of which are NP-complete and require complicated heuristics for solving them, place an excessive strain on mobile nodes processors leading to higher consumption of limited battery power. 6. Hidden Terminal Problem The hidden terminal problem is inherent in MANET. This problem occurs when packets originating from two or more sender nodes, which are not within the direct transmission range of each other, collide at a common receiver node. It necessitates retransmission of packets, which may not be acceptable for flows that have strict QoS requirements. Some control packet exchange mechanisms reduce the hidden terminal problem only to a certain extent.
VI. QOS PROTOCOL PERFORMANCE ISSUES
Even after overcoming the challenges of MANET, a number of factors [12] have major impacts while evaluating the performance of QoS protocols. Some of these parameters are of particular interest considering the characteristics of the MANET environment. They can be summarized as follows: 1. Node mobility This parameter has been the focus of research studies such as [8]. This factor generally encompasses several parameters: the nodes' maximum and minimum speed, speed
pattern and pause time. The node's speed pattern determines whether the node moves at uniform speed at all times or whether it is constantly varying, and also how it accelerates, for example, uniformly or exponentially with time. The pause time determines the length of time nodes remain stationary between each period of movement. Together with maximum and minimum speed, this parameter determines how often the network topology changes and thus how often network state information must be updated. 2. Network size Since QoS state has to be gathered or disseminated in some way for routing decisions to be made, the larger the network, the more difficult this becomes in terms of update latency and message overhead. This is the same as with all network state information, such as that used in best-effort protocols. 3. Number, type and data rate of traffic sources Intuitively, a smaller number of sources can be constant bit rate (CBR) or may generate bits or packets at a rate that varies with time according to the Poisson distribution, or any other mathematical model. The maximum data rate affects the number of packets in the network and hence the network load. All of these factors affect performance significantly [16]. 4. Node transmission power Some nodes may have the ability to vary their transmission power [9]. This is important, since at a higher power, nodes have more direct neighbors and hence connectivity increases, but the interference between nodes increases as well. Transmission power control can also result in unidirectional links between nodes, which can affect the performance of
routing protocols. This factor has also been studied extensively [13]. 5. Channel characteristics As detailed earlier, there are many reasons for the wireless channel being unreliable, i.e. many reasons why bits, and hence data packets, may not be delivered correctly. These all affect the network's ability to provide QoS.
VII.
CONCLUSION
In this paper we have presented a comprehensive overview of the state-of-the-art research work on different on demand routing protocols performance and QoS support in MANETs. The field of mobile ad hoc networks is rapidly growing and changing and while it is not clear that any particular algorithm is the best for all environment, each protocol has definite advantages and disadvantages, and is well suited for certain situations. The Efficient routing protocols can provide significant benefits to mobile ad hoc networks, in terms of both performance and reliability. Many routing protocols for such networks have been proposed so far. Amongst the most popular ones are AODV and DSR. Also we have presented the QoS issues and challenges involved in MANETs. Continued growth is expected in this area of research in order to develop, test and implement the essential building blocks for providing efficient and seamless communications in wireless mobile ad hoc networks.
REFERENCES
[1] G.Vijaya Kumar , Y.Vasudeva Reddyr , Dr.M.Nagendra, Current Research Work on Routing Protocols for MANET: A Literature Survey, G.Vijaya Kumar et. al. / (IJCSE) International Journal on Computer Science
and Engineering Vol. 02,pp. 706-713, No. 03, 2010. [2] Arun Kumar B. R., Lokanatha C. Reddy, Prakash S. Hiremath Performance Comparison of Wireless Mobile Ad-Hoc Network Routing Protocols IJCSNS International Journal of Computer Science and Network Security, VOL.8 No.6, pp. 337343,June 2008. [3] Salim M Zaki1, Mohd Asri Ngadi, Shukor Abd Razak A Review of Delay Aware Routing Protocols in MANET Computer Science Letters Vol. 1(1) 2009 ISSR Journals. [4] S.Sridhar, R.Baskaran A Survey on QoS Based Routing Protocols for MANET International Journal of Computer Applications (0975 8887) Volume 8 No.3, October 2010. [5] C. Perkins, E. B-Royer and S. Das, Ad hoc on-demand distance vector (AODV) routing, RFC 3561, July, 2003. [6] D. B. Johnson , D. A. Maltz , Y. C. Hu, The Dynamic Source Routing Protocol for Mobile Ad Hoc Networks (DSR) , IETF Draft, April 2003, work in progress. http://www.ietf.org /internet -drafts/draft-ietfmanet-dsr -9.txt. [7] Ahn G S, Campbell A T, Veres A, Sun L H (2002) Supporting service differentiation for real-time and best effort traffic in stateless Wireless Ad Hoc Networks (SWAN), IEEE Transactions on Mobile Computing 1 (3): 192207 [8] Brach J, Maltz D A, Johnson D B, Hu Y C, Jetcheva J (1998) A performance comparison of multi-hop wireless ad hoc network routing protocols. Proc. 4th Annual ACM/IEEE International Conference on Mobile Computing and Networking 85-97.
[9] Chang J H, Tassiulas L (2000) Energyconserving routing in wireless ad-hoc networks. Proc. IEEE INFO COM 1:22-31. [10] Doshi S, Bhandare S, Brown T (2002) An on-demand minimum energy routing protocol for a wireless ad-hoc network. Mobile Computing and Communications Review 6(2):50-66. [11] Gerharz M, de Waal C, Frank M, James P (2003) A practical view on quality-of-service support in wireless ad hoc networks. Proc. IEEE Workshop on Applications and Services in Wireless Networks (ASWN), citeseer.ist.psu.edu/jgerharz03practica1.html. [12] Hanzo L, Tafazolli R (2007) A survey of QoS routing solutions for mobile ad hoc networks. Communications Surveys & Tutorials, IEEE 9(2):50-70. [13] Yu C, Lee B, Youn H Y (2003) Energyefficient routing protocols for mobile ad-hoc networks. Wiley J. Wireless Communications and Mobile Computing Journal 3(8): 959-973. [14] Perkins C E (2001) Ad Hoc Networking. Ch. 3, Addison Wesley, Reading, MA. [15] Perkins C E, Royer E M, Das S R (2000) Quality of service for ad hoc on-demand distance vector routing. IETF Internet Draft. [16] Perkins C E, Royer E M, Das S R, Marina M K (2001) Performance comparison of two on-demand routing protocols for ad hoc networks. IEEE Personal Communications Magazine 8:16-28.
BUILDING AN APPLICATION OFFLINE BARCODE SCANNER FOR ANDROID DEVICE

S.Suresh 1, R.Valarmathi 2, S Naveen Kumar 3 1,2 Lecturer,Department of CSE, S K R Engineering college 3 PG Student Department of CSE, S K R Engineering college
ABSTRACT
The project is to have the Mobility to check the goods taken in and out of the Checkposts. Even though if we don't have Systems at the check-post, we should be able to note the goods entering in and out from the Check-post. Through the application we should be able to solve three purposes. It Should be able to note down the details of the goods brought/take in and out through the Check-post and other details of the person bringing/taking it . It Should be able to scan the 2D Bar Code generated by one of our Main Application and verify it whether the information is correct or not. There may or may not be any connectivity (every time) to directly connect to the main DB and upload the Data, so the application should be able to store the Data internally and then get the file or Data to be restored/upload into the main DB. 1.INTRODUCTION
The state government of Karnataka, in order to facilitate farmers to sell their produce and get reasonable prices, created APMCs(Agricultural Product Marketing Committee ) in many towns. Most of the APMC have market where traders and other marketing agents are provided stalls and shops for purchase of agriculture produce from farmers. Farmers can sell their produce to agents or traders under supervision of APMC. Android is a Linux based operating system for mobile devices such as smartphones and tablet computers. It is developed by the Open Handset Alliance led by Google.Android has a large community of developers writing applications (apps) that extend the functionality of the devices. Developers write primarily in a customized version of Java. Apps can be downloaded from third-party sites or through online stores such as Android Market, the app store run by Google. As of October 2011 there were more than 400,000 apps available for Android, and the estimated number of applications downloaded from the Android Market as of December 2011 exceeded 10 billion.Android was listed as the bestselling smartphone platform worldwide in Q4 2010 by Canalys with over 200 million Android devices in use by November 2011.[According to Googles Andy Rubin, as of December 2011 there are over 700,000 Android devices activated every day. 1.1Existing System: In Agriculture product marketing committee ,the details of the product are entered manually by the gatekeepers. It requires a PC with network connection to enter the commodities and the user details , In this type of system there will be no mobility and portability ,which will take much time for the gatekeepers to enter the details of the product. The existing system will consume
more power and when there is no power then it will be difficult for the user to use the system and enter the details of the product this will lead to delay in the process.
very difficult for the gatekeepers to enter the details of the product and this will take more time. If there is any failure in the database, there will be huge loss of data and it will be very difficult to retrieve it. 1.2 Proposed System The feasible solution is to use an android device capable of getting the information about the product in the check post by scanning the barcode available in the product . High autonomy, flexibility , portability and low cost of setup makes a promising platform for agricultural products. Due to its portability, it is easy and fast to get the product details anywhere and anytime. In the existing system, the details of the product in the check post are entered manually which requires more time and labor to do the job. Other drawbacks are it is not portable, occupies more space and time, the system should be always connected within a network to store the details. To overcome these drawbacks, we are developing an application on android platform to scan the product details using the camera in the device instead of writing manually in the system. This application will help the user to interact with the android device easily and help them to enter all the details of the visitors like farmers, traders and also the commodity details. 1.2.1 Architecture
1.1.1
Description
APMC User interact with the normal system after verifying the farmer, the product details and the system uses the apmc application software .The details in this application is manually entered by the user and the data entered will be stored in the database when it connected online. The payroll data will verified from the apmc user by the application software present in the system. Payroll data is the data which will be given by the user to interact with the system and this is connected with the database through the application software. The database will save and maintain all the details of the product 1.1.2 Drawbacks The existing system does not provide the user to have mobility with the device and it requires manual work to enter all the payroll data into the system. It consumes more power for the system to be used. In this system the storage of data in database is difficult when the system is offline. There will be delay in the process if there is failure in power supply. If there is large queue then it is
. Apmc users use the mobile device for doing all the operations. Initially
the user will verify the farmer and trader details and enter their personal details and the parwesh patra number . After the verification process is completed, the user will check the product brought by the farmer or the trader in process. Android mobile used by the gatekeepers will have the application of mobile barcode scanner, using which each product will be scanned using the barcode scanner application. This barcode scanner will scan all the product details and it will update in the entry gate form. This entry gate will have all the details of the farmer as well as the product. When there is a need to store the details of the farmers or traders product, then the details are stored in main database if the system is online or else the details will be stored in the mobile database and later it can be sent to main database when the system is online 1.2.2 Mobile APMC Application:
1.2.2.2 Exit Gate: When the farmers or traders come out of the apmc, the details of the product are verified using the scanner and when the details of the product is retrieved, then the information will be stored in the database. 1.2.2.3 Entry Gate Register: The record which is entered in the "Entry Gate" those will be displayed under "Entry Gate register". The name of the visitor and the commodity will be sorted according to alphabetical order. The date and time of entry will also be displayed accordingly in the entry gate register. 1.2.2.4 Exit Gate Register: The record which is entered in the Exit Gate" those will be displayed under "Exit Gate register". The name of the visitor and the commodity will be sorted according to alphabetical order. The date and time of entry will also be displayed accordingly in the entry gate register. 1.2.2.5 APMC Stock Details: This will have all the details of the product and its Quantity which is taken inward and outward with date and time of entry. It will also have the details of the amount of the product which is balance and to be given by the customer.
1.2.2.1 Entry Gate: When the farmers or traders enter the apmc, the gatekeeper has to register the visitors and the product details using the entry gate option. This entry gate will have the personal details of the visitor and the prawesh patra number.The commodity details will be scanned using the barcode scanner application developed in the phone. After the product is scanned, the details of the product will be stored in the main database when it is online or else the details will be stored in the mobile database.
1.2.3 Advantages Android mobile application will provide the users to have mobility to barge in the details of the farmer, trader and the commodities. This application will reduce the manual work by scanning the details of the product and it will upgrade all the details in the database. The use of android mobile will require less power. This system does not require online facility always to store the details in the main database. If the system is offline, this application will store all the contents in the mobile database and later it will transfer to the main database when it is
online. The use of android mobile will help the gatekeepers of apmc to process the work faster and it will not cause any delay. If there are any large number of visitors to apmc, then each gatekeeper can be given an android mobile with the barcode application to process the work faster
Reference:
[1]http://developer.android.com/index.htm [2]http://casumm.files.wordpress.com/2008 [3] http://hpsamb.nic.in/amb_apmc.htm
Conclusion:
The use of android mobile and the android application will help the apmc users to have an easy interaction with the system and help them to process their work very soon.
DYNAMIC VOLTAGE SCALING FOR MULTI-PROCESSOR USING GALS

1
S.P.Parameswar 1 , R.Thukkaisamy 2, Assistant Professor, Department of IT, RVS College of Engineering and Technology, Dindigul 2 PG StudentSasurie College of Engineering,Erode
Abstract
The approach is based on using control-loop feedback mechanism to maximize the efficiency on exploiting available resources such as CPU time, operating frequency, etc. Each Processing Element (PE) in the architecture is equipped with a frequency scaling module responsible for tuning the frequency of processors at run-time according to the application requirements. Results show the systems capability of adapting to disturbing conditions. However, its use typically also introduces performance penalties due to additional communication latency between clock domains. We show that GALS chip multiprocessors (CMPs) with large inter-processor rst-inputs rst outputs (FIFOs) buffers can inherently hide much of the GALS performance penalty while executing applications that have been mapped with few communication loops. In fact, the penalty can be driven to zero with sufficiently large FIFOs and the removal of multiple-loop communication links. We present an example mesh-connected GALS chip multiprocessor and show it has a less than 1% performance (throughput) reduction on average compared to the corresponding synchronous system for many DSP workloads. Furthermore, adaptive clock and voltage scaling for each processor provides an approximately 40% power savings without any performance reduction. I. INTRODUCTION Dynamic Frequency Scaling(DFS)
Dynamic Frequency Scaling (DFS) is a widely used technique aimed at adjusting computational power to application needs. It is often associated to Dynamic Voltage Scaling (DVS) therefore enabling to achieve signicant power reductions when computing demand is low; some cited benets also comprise the reduction of thermal hotspots that participate in the accelerated aging of the circuits due to the thermal stress.In multi core systems such as general purpose processors and high performance embedded processors, the operating system is responsible of dynamically adjusting the frequency of each processor to the current workload. This is facilitated by the presence of dedicated hardware monitors that the OS can rapidly access. In Linux based systems, two popular policies are used at kernel-level: ondemand and conservative. The on-demand governor switches to the highest available frequency whenever a load is detected whereas the conservative policy incrementally increases the frequency in a step-by-step fashion, yielding to better power savings at the expense of a lesser reactivity.
Figure 1.2 Reducing and eliminating performance penalties of GALS This delay normally results in a reduction of performance (throughout).In this section, we discuss in depth principles behind how GALS clocking affects system throughput and nd several key architectural features which can hide the GALS effects. Fullyavoiding any GALS performance penalties is possible for the described GALS chip multiprocessor. To simplify the discussion, in this section both GALS and synchronous systems use the same clock frequencies.A. Related WorkSignicant previous research has studied the GALS uniprocessor in which portions of each processor are located inseparate clock domains. Results have shown GALS uniprocessors experience a non-negligible performance reductioncompared to a corresponding synchronous uniprocessor.
Figure 1.1. Clock domain fast Fourier transform (FFT) processor], the latency of a block of data will likely be lower with the source synchronous multi-word ow control method compared to the single transaction handshaking method due to its higher throughput. Dual-clock FIFOs are wellsuited to provide asynchronous boundary communication using the source synchronous multi-word ow control method.
GALS CHIP MULTIPROCESSORS

Single processor in the GALS system. Processors utilize individual programmable ring oscillators that are congurable over a wide range of frequencies. Each processor also contains two dual-clock FIFOs, which write and read in independent clock domains and reliably transfer data across the asynchronous boundaries. Reducing and eliminating performance penalties of GALS GALS systems require synchronization circuits between clock domains to reliably transfer data. Clock phase edge alignment time for unmatched clocks and synchronization circuitry introduces a synchronization delay.
Task Migration with PID Controller Architecture

This value is sent to the frequency scaling module which will be responsible for scaling up and down the frequency of the processor to cope with application requirements. The procedure is then repeated and the obtained throughput gradually gets closer to the desired throughput. This is explained by the fact that after each iteration the error value is reduced assuming that the values of P, I and D have been correctly chosen. The system then calculates an error value which is obtained by the difference between the desired and obtained throughput. As output of the PID controller a frequency value is indicated
the frequency whenever a given condition is respected. Our approach differs of the others by the following reasons: 1) There is no master in the architecture, so that decisions are taken in a distributed way. Each processing element (PE) in the architecture controls its own frequency. Figure 1.3 PID Controller 2) System optimizations are not global since there is no centralized control of the system, making system management more difficult. This project presents two major contributions: 1) A novel and purely distributed memory architecture with adaptation capabilities driven by local PID controllers. 2) Analyze and discuss the benets of using such strategy on a Homogeneous PSoC architecture running audio/video applications.
II. IMPLEMENTATION OF MULTIPROCESSOR

It presents promising results regarding to the adaptability of the system. We have demonstrated the efficiency of the proposed PID controller by presenting three different scenarios. For validating our approach we have implemented a multi-threaded version of the MJPEG decoder together with an ADPCM and FIR application which exchanges message using a message passing interface (MPI). Inter-Processor Network The networking strategy between processors also strongly impacts the behavior of GALS CMPs. GALS chip multiprocessor with inter-processor and processor-memory communication through a shared global bus. This scheme provides very exible communication, but places heavy demands on the global bus; thus, the system performance is highly dependent on the level and intensity of the bus traffic. Furthermore, this architecture lacks scalability since increased global bus traffic will likely signicantly reduce system performance under high traffic conditions or with a large number of processors. Processing Element (PE) The system elects a master node which is responsible for controlling the frequency of each processor in the architecture. Others existing solutions have pre dened states in which processors change
III. RESULT ANALYSIS

Analysis Of The Performance Effects Of Gals The very small performance reduction of the GALS chip multi processor motivates us to understand the factors that affect performance in GALS style processors. The chain of events that allow synchronization circuit latency to nally affect application throughput. It is a complex relationship and several methods are available to hide the GALS penalties.
Figure 3.1. The result Synchronization circuit latency G6
B.GALS MODULE
Figure 3.2. Penalties
Eliminating
Performance
Increasing FIFO Sizes: Increasing the FIFO size will reduce FIFO stalls as well as FIFO stall loops, and hence increase system performance and reduce the GALS performance penalty. With a sufficiently large FIFO, there will be no FIFO full stalls and the number of FIFO empty stalls can also be greatly reduced then the communication loop will be broken and no GALS performance penalties will result. The top and middle subplots performance with different FIFO sizes for the synchronous and GALS systems, respectively. We assume all FIFOs in the system are the same size and thus, their sizes are scaled together in this analysis.
C.MULTIPROCESSOR MODULE
WITH
GALS
IV.SIMULATION MODULE A.MULTIPROCESSOR MODULE
V. CONCLUSION
In this thesis solve the multithreading Dynamic memory location to spread out to all other sequence Data presented in this paper are based on simulations of a fully-functional fabricated GALS chip multiprocessor and physical designs based on the chip. Results from this work apply to systems with three key features as discussed in namely: Organized by: Department of Computer Science and Engineering , Anand Institute of Higher Technology, Chennai. www.iconic12.in E-mail: iconic.aiht@gmail.com Page 175
1) Multi core processors (homogeneous and heterogeneous) operating in independent clock domains. 2) Source synchronous multi-word ow control for asynchronous boundary communication. 3) Distributed interconnect, such as amesh. While results certainly vary over different applications and specic architectures, systems with these features should still exhibit the following benets over many workloads: good scalability, small performance reductions due to asynchronous communication overhead, and large potential power reductions from clock frequency and supply voltage scaling.
applications. EURASIP J. Embedded 2008:115, 2008.
Syst.,
[2] A.Alimonda, and S. Impact of task migration on streaming multimedia for embedded multiprocessors: A quantitative evaluation. In ESTImedia, pages 5964. IEEE, 2007. [3] S.Bertozzi and Supporting task migration in multiprocessor systems-on-chip: A feasibility study. In DATE 06. Proceed- ings, volume 1, pages 16, 2006. [4] S.Borkar, T. Kainik, S. Narendra, J. Tschanz, Parameter variations and impact on circuits and microarchitec-ture, in Proc. IEEE Int. Conf. Des. Autom., Jun. 2003, pp. 338 342. [5] F.Clermidy. Dynamic and Distributed Frequency Assignment for Energy and Latency Constrained MP-SoC. In DATE09, pages 15641567, Nice, France, 04 2009.
VI.REFERENCES
[1] A. Acquaviva, A. Alimonda, S. Carta, and M. Pittau. Assessing task migration impact on embedded soft real-time streaming multimedia
FAULT INJECTION BASED VIRTUAL CHANNEL ALLOCATION OF NETWORK-ON-CHIP (NOC) ROUTER

1
N.Ashok kumar 1, Dr. A.Kavitha 2 Assistant Professor Department of ECE, R.V.S College of Engineering&Tech,Dindigul 2 Professor, Department of ECE, R.V.S College of Engineering&Tech,Dindigul
Abstract:
Network-on-chip (NoC) architectures are emerging for the highly scalable, reliable, and modular on-chip communication infrastructure platform. The NoC architecture uses layered protocols and packet-switched networks which consist of on-chip routers, links, and network interfaces on a predefined topology. In this Paper ,we design network-on-chip which is based on the Cartesian network environment. This paper proposes the new Cartesian topology which is used to reduce network routing time, and it is an suitable alternate to network design and implementation.The cartesian Network-On-Chip can be modelled using Verilog HDL and simulated using Model sim software.
I.INTRODUCTION
Cartesian routing is a fast packet routing mechanism intended for geographic addresses and can effectively accelerate the packet routing process within a local or metropolitan environment. The wide area Cartesian routing described in this paper is an extension of the Cartesian routing algorithms designed to make the exchange of internet work packets between geographical regions possible. It also introduces a new hierarchical structure for the entire Internet. The proposed Internet is viewed as a hierarchy of networks consisting of routers. At the highest level of this hierarchy, major routers exchange packets between large geopolitical areas such as countries, states, or provinces. At the lowest level of the structure, packets are routed between local routers in small geographical regions ranging from an office to a small town. There are only four layers in this structure and at each layer Cartesian routing is employed to send packets from the source router to the destination.
The wide-area Cartesian Routing algorithm overcomes these problems by creating a

C A R T E S IA N R O UTER
A r te r ia l r o u te r C o lle c to r
hierarchical network consisting of two or more layers. Each network at a given layer encompasses one or more networks. Each network, regardless of its layer, employees the Cartesian Routing algorithm for packet routing. Two extensions to the original Cartesian Routing algorithm are required: each network (except for the highest) requires an internet work router that can direct packets destined for other networks up to the encompassing network.
The address structure reflects the network structure, with specific fields in the address associated with each layer.
direction of the arterial router (i.e., east or west) and indicates whether the arterial router has a connection to the north, the south, or both. Figure 1 illustrates a Cartesian network. 2.2 Cartesian Routing Packets can arrive on either a west or east port of a collector router. Packets intended for a different latitude are forwarded out the opposite port from which they are received. The ADI determines the packets initial direction on the collector router when a packet arrives on the bottom port of a collector router.
ADDRESS Dest_lat=node_lat FUNCTIONS Router keeps the Packet to to to
II. CARTESIAN NETWORKS

A Cartesian network consists of a set of collectors and one or more arterials FIG:1 Cartesian router Each collector is a chain of collector routers running east-west and sharing a common latitude. Collector routers have two side ports (east and west) to exchange packets horizontally. Each collector router also has a bottom Port which allows it to connect to a set of local hosts. Arterials exchange packets between collectors. Each arterial router, except the most northerly and the most southerly has, at least, four ports (north, south, east and west). Arterials need not share a common longitude. In a Cartesian network, the imposed topological structure relieves each router from maintaining routing tables. Each router is bound to a unique pair of addresses, the state information is minimal, and each router maintains the accessibility of arterials to its west and east. 2.1 Cartesian Network Initialization In Cartesian routing, each arterial issues Arterial This Way (ATW) control packets during its initialization process. An ATW tells the receiving collector router if an arterial is accessible through the incoming port. An ATW also specifies what kind of connection is accessible via the incoming port: north, south, north and south or neither. Upon receiving an ATW, each collector router updates its Arterial Direction Indicator (ADI) and forwards the ATW to the opposite port. ATWs are also used to establish Virtual Arterials , constructed in situations where it is physically impossible for an arterial to span two collectors. The ADI points in the
Dest_long=node_long Dest_lat>node_lat Packet routed North Port Dest_long>node_long Dest_lat<node_lat Packet routed South Port Dest_long>node_long Dest_lat=node_lat Packet routed East Port Dest_long>node_long Dest_long<node_long
Packet Discarded
Table 1:4-port Routing algorithm
In deciding a packets initial direction, the router first compares the packets destination address with its own address. The packet will be forwarded in the direction of the destination if the destination latitude is the same as the collectors. The packet is forwarded in the direction of the ADI if the destination is on different latitude. 2.3 Wide Areas Cartesian Networks A Cartesian network provides a straightforward topological structure that relieves collector routers from the need to maintain routing tables. However, it would be
unrealistic to implement a single worldwide Cartesian network. Such a widespread Cartesian network, for example, requires every packet destined for a router with the same latitude identifier as the source routers latitude identifier to visit all the collector routers. It is also necessary for such a network to have one collector for every possible latitude. These limitations suggest that implementing a single worldwide Cartesian network would be impractical. An alternative to a worldwide Cartesian network is to create a set of smaller Cartesian networks and implement a mechanism for interchanging packets between them. One approach to interchanging packets between Cartesian networks is to forward packets towards their destinations. When a packet reaches the boundary of a network it falls off the edge and is delivered to a special router to be forwarded towards the destination address. The process of routing a packet from one network to another using this approach becomes problematic when networks are interleaved or overlapped .Two networks are considered interleaved if there is at least one collector router on one of the networks where its longitude identifier lies between the longitude identifiers of two collectors from the other network and its latitude identifier lies between the latitude identifiers of two collectors from the other network. Figure illustrates two interleaved networks. Two networks are said to be overlapped if there is at least one collector router on one of the networks where its longitude identifier lies between the longitude identifiers of two collectors from the other network and all three of them share the same latitude identifier. Figure illustrates two overlapped networks. An alternative method for delivering a packet to its destination is to find the destination network address and then to route the packet to the destination network by using Cartesian routing algorithms. This implies that each network must be identifiable using the
packets destination address. If we assume that each network has a rectangular shape, recognizing the destination network is a matter of comparing the packets destination address with the networks boundaries. However, there are a number of reasons to assume that it would be unrealistic to expect networks to have rectangular borders: geographical barriers and political jurisdictions, for example. Since Cartesian routing uses latitude and longitude pairs to identify the source and the destination addresses of packets, this information is not sufficient to determine to which network a collector/arterial belongs in the case of interleaved and overlapped networks. This, in turn, suggests that an additional set of information is required to identify to which network a collector or arterial is connected. To achieve this, the authors propose a hierarchical structure for Cartesian networks. In the next section the possibility of multiple-layer Cartesian networks as a solution for interchanging packets between arbitrary shaped interleaved and overlapped Cartesian networks are explained. In the remainder of this paper, the terms wide area Cartesian networks and multiple-layer Cartesian networks are used interchangeably.
S ignal arrives
ROUTER STRUCTURE
P ACKET DETECT ION MODULE

Packet passed
S ignal leaves
Incoming packet warning OUTGO ING P ACKET STORAGE
DEC IS ION MAK ING MODULE
Packet from S ignal from ipsY ipsY
IN COM ING P ACKET STORAGE
signal
P ass packet to opsX
Fig 2:Router Structure
III. MULTIPLE-LAYER CARTESIAN NETWORKS

Multiple-layer Cartesian networks impose a new set of topological dependencies among a set of Cartesian networks, such that interchanging packets between networks is feasible without creating and maintaining routing tables. Generally, in multiple layer Cartesian networks, the idea of Cartesian networks is expanded in a larger scale using a hierarchical structure. 3.1 DMM Multiple-layer Cartesian networks have a hierarchical structure. The highest layer of the hierarchy is a single Cartesian network. Each underlying layer consists of a set of mutually disjoint Cartesian networks (i.e., they are physically disjoint and share no collector or arterial router); however, networks in the same layer can be interleaved or overlapped.
DECISION MAKING MODULE
packet DA[i]
RECEIVING-ADDRESS FLAG
indicates which portion of the address is being received(latitude or longitude) and if the entire address is received,it sets the ADDRESSRECEIVEDflag. When the ADDRESSRECEIVED flag is set, the IPS is told to keep the packet.The RAP stores the router address and pipes it our serially so that it can be compared with the incoming destination address. The Router address is static value,and is kept in non-volatile memory for long term storage.The ADM simply compares the i-th bit of the destination address to the i-th bit of the router address as the destination address is read in to the DMM.If the DA does not equal to RA,the ADM tells the incoming packet storage(IPS) module to destroy the stored packet.
ADDRESS DIFFERENTIATION MODULE
DA[i] > RA[i] DA[i]< RA[i]
enable Receivin g flag
PACKET COUNTER clock MODULE ADDRESS COUNTER MODULE ROUTER ADDRESS Address Received flag Receiving latitudinal flag PIPE Receiving longitudinal flag
RA[i]
3.2 Router Identification In a Cartesian network, each router is bound to a Cartesian address . Whereas, in a multiplelayer Cartesian network, each router is bound to an identifier . The identifier of a router at layer-m of an m-layer Cartesian network is the same as its Cartesian address. An identifier of a router at lower layers is the identifier of its immediate encompassing router, followed by the Cartesian address of the router itself, meaning that an identifieris an ordered list of Cartesian addresses. For example, routers at layer-(m-1) maintain the Cartesian address of the router that represents the network to which
Fig 3:Decision Making Module The PCM(Packet Counter Module) strips the destination address from the packet.It counts the incoming packet bits and sets the RECEIVING-ADDRESS flag when the first bit of the address is read.The ACM(Address Counter Module) keeps the track of the number of address bits that have been read.It
3.3 Packet Routing In an m-layer Cartesian network, each collector router at layer-n is bound to an identifier which is a list of m-n+1 Cartesian addresses. A packet can enter a network at layer-n through the bottom port of a collector router or the top port of the networks IR. Packets received on the bottom port of a collector router are either local or non-local to the network, as described above. When a packet is found to be local, that is, the network encompasses the destination address, the router tags the packet as a local packet by setting a single bit of the packets address called the local bit . When a packet is local to a network at layer-n, the (m-n+1) th Cartesian address of the packets destination address is used to route the packet in the network using Cartesian routing algorithm. For example, at layer-m, the first Cartesian address is used for Cartesian routing, while at layer-1 the mth address is used. When a packet is received by a router on its bottom port, the address is Cartesian address of the encompassing router at layer-(m-1) inspected. Non-local packets must be sent towards the networks IR in order to be delivered to the encompassing network. The router clears the local bit and forwards the packet towards the IR. Forwarding a packet to the IR requires that each collector router and arterial maintains an Internetwork Router Direction Indicator or IRDI. The IRDI determines if the internetwork router is accessible through the west port, east port or neither, in the case of collector routers; or whether it is accessible through the north port, south port or neither,in the case of arterials. If the packet is determined to be non-local, it is forwarded in the direction specifiedby the IRDI. When the IRDI indicates that IR is not accessible, the packet is dropped and a message is returned to the source notifying that the destination address is not reachable.
they belong, followed by their own Cartesian address. In general, in an m-layer Cartesian network, each router at layer-n maintains a list of (m-n+1) ordered Cartesian addresses, where . For example, routers at the lowest layer of the hierarchy, layer-1, which are connected to local hosts through their bottom ports, are bound to m ordered Cartesian addresses: m-1 correspond to the identifier of the encompassing router at layer-2, and one, the Cartesian address of the router itself. Figure 5 illustrates the hierarchical addressing structure of an identifier for a router at layer-n of an mlayer Cartesian network. The hierarchical addressing structure overcomes the problems with both interleaved and overlapped networks since every router in the hierarchy has a unique address. It also enables a router in a network at layern to determine if a packet is local to the network or not. A packet is said to be local to a network if the network encompasses the destination address of the packet. A router can determine this by comparing the most significant m-n Cartesian addresses of the packets destination address with the first m-n Cartesian addresses of its own identifier.
PA CK ET FO RM A T
S O P S ta r t o f P a c k e t( 6 b it s - - -6 b 1 1 1 1 1 1 ) E O P E n d o f P a c k e t(6 b it s - - -6 b 1 1 1 1 1 1 ) M C a st( U n ic a st -4 b 0 0 0 0 ,M u ltic a s t - b 0 0 0 1 ) D L A T D e st in a t io n L a tit u d e D L O N D e st in a t io n L o n g itu d e
SO P M DLAT D L O N SL A T SL O N G D A TA E OP
A collector router that receives a packet on its west or east port checks the local bit; if set the Cartesian routing algorithm is employed to route the packet, otherwise the packet is forwarded to the opposite port. When a packet enters a network through the top port of the networks IR, the packet is guaranteed to be local, since this has already been verified by the encompassing network. Upon receiving the packet, the IR sets the local bit and then applies the Cartesian routing algorithm on the (m-n+1) th Cartesian address of the packets destination address.
IV.SIMULATION RESULTS
4.1.Router Design simulation Result
[2]. Asynchronous Bypass Channels for Multi-Synchronous NOCs: ARouter Microarchitecture,Topology,Routing Algorithm, Tushar N.K.Jain,Mukand Ramakrishna .Paul V.GratzMember, IEEE, Alex Sprintson, Member, IEEE, and Gwan Choi Member, IEEEIEEE TRANSACTIONS ON COMPUTER AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 30, NO. 11, NOVEMBER 2011. [3]. Adaptive Routing in Network-on-Chip Using a Dynamic-Programming Network,Terrence Mak,IEEE Member,Peter Y.K.Cheung,Senior Member ,IEEE Kai-Pui Lam ,and Wayne Luk,Member,IEEEIEEE TRANSACTIONS ON INDUSTRIAL
Module Overwritt en Faults Latent Errors Failure Experiments Avera ge of Fault Laten cy (ns) 241 174 166 194
V.CONCLUSION
In Previous work, they designed conventional router which uses a routing table to determine whether to keep, forward or discard the packets. As networks grow in size, the memory requirements of the routing tables increases proportionally. The average search time increases as the routing table increases. ASIC based router design. In our Proposed work, the New cartesian network is designed, which is Independent of routing table.It is the High speed network transmission when compared with existing work.
# 201 5 760 168 294 3 % 69. 76 31. 17 35. 44 51 # % 5 1.1 4 5 0.8 3 2 0.5 6 1 0.2 2 # 845 166 5 306 281 6 % 29 68 64 48.8
Input Buffer Routing Unit Switch Total
VI.REFERENCES
[1]
.New Theory for Deadlock-Free Multicast Routing in Worm-hol-swithched Virtualchannelless Network-On-Chip, , Faizal Arya Samman, Member, IEEE, and Thomas Hollstein,Member, IEE&Manfred GlesnerFellow,IEEEIEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 22, NO. 4, APRIL 2011.
ELECTRONICS,VOL58,NO .8,AUGUST 2011. [4]. Power performance Analysis of Network-on-Chip with Arbitrary Buffer Allocation schemes.Mohammad Arjomand and Hamid Sarbazi-Azad ,IEEE TRANSACTIONS ON COMPUTER AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS,VOL 29,NO 10.OCTOBER 2010,. [5] Application Aware NoC Design For Efficient SDRAm Acess,Wooyoung Jang,student member,IEEE and David Z.Pan,Senior Member,IEEE IEEE TRANSACTIONS ON COMPUTER AIDED DESIGN OF INTEGRATED CIRCUITS
AND SYSTEMS,VOL 230,NO 10.OCTOBER2010. [6] Application aware NoC Design for Efficient SDRAM Acess , IEEE TRANSACTIONS ON COMPUTER AIDED DESIGN AND SYSTEMS, VOL. 30, NO. 10,OCTOBER 2010 . [7] An SDRAM Aware Router for Network On Chip, IEEE TRANSACTIONS ON COMPUTER AIDED DESIGN AND SYSTEMS, VOL. 29, NO. 10,OCTOBER 2010. [8] On Topology configuration for Defect- Tolerant NoC Based Homogeneous Manycore system IEEE TRANSACTIORNS ON VERY LARGE SCALE INTEGRATED (VLSI)SYSTEMS, VOL. 17, NO .9, SEPTEMBER 2009. [9] An Energy and Performance Exploration of Network-on-Chip Architectures IEEE TRANSACTIORNS ON VERY LARGE SCALE INTEGRATED (VLSI)SYSTEMS, VOL. 17, NO .3, MARCH 2009.
[10] M.Ali, M.Welzl ,S.Hessler, and S.Hellebrand, an efficient fault-tolerant mechanism to deal with permanent and transient failures in a network on chip,International journal of high performance system architecture, Vol.1, No.2,2007,pp.113-123 [11]A.P.Frantz,M.Cassel,F.L.Kastensmidt,E.C ota,and L.Carro,crosstalk and SEU Aware Networks on chips ,IEEE Design and Test of computers ,2007,pp.340-350 [12]A.P.Frantz,M.Cassel,F.L.Kastensmidt,E.C ota,and L.Carro,Depandable Network onchip Router Able to simultaneously Tollerate Soft Errors and Crosstalk IEEE International Test Conference ,2006,pp.1-9.
BER PERFORMANCE ANALYSIS UNDER RAYLEIGH MODEL

Sahaya Bindhya Dhas.A1 , Jeril Viji.T2 1 M.E Computer and Communication Student, Saveetha Engineering College 2 Assistant Professor, Saveetha Engineering College, Chennai, India
Abstract
In Free-Space Optical communication (FSO) received signals are affected by various types of fading particularly turbulence induced fading. To reduce the turbulence induced fading over Free Space Optical links fading mitigation technique such as cooperative diversity technique is applied. In this paper a one relay cooperative diversity scheme is proposed and analyzed with intensity modulation and direct detection(IM/DD).Cooperative diversity can be perform based on decode and forward relaying strategy. The error performance is derived in the presence and absence of background radiation thus shows optimal diversity orders over both Rayleigh fading model and Lognormal fading model. Index Terms- Free space optical communication, Cooperative diversity I. INTRODUCTION Free space optical communication has attracted considerable attention recently for a variety of applications also a promising solution for last mile problem [1]. In free space optical communication atmospheric turbulence causes fluctuations in both the intensity and the phase of the received signal impairing link performance. These fluctuations can lead to an increase in the link error probability limiting the performance of communication system [2]. To maintain acceptable performance levels over FSO links fading-mitigation techniques such as spatial diversity techniques must be employed. Spatial diversity involves deployment of more apertures and is commonly used to combat fading and improve the link reliability. In this context apertureaveraging receiver diversity, spatial repletion codes, unipolar versions of the orthogonal space-time codes and transmit laser selection were proposed as FSO adapted spatial diversity solutions [3]-[4]. In the same way the bit rates of Multiple-Input-Multiple-Output (MIMO) FSO links were evaluated in [5]. A conventional single hop system uses direct transmission where a receiver decodes the information only based on direct signal , whereas the cooperative diversity technique considers the other signal as contribution. Thus cooperative diversity decodes the information from the combination of two signals. In cooperative diversity techniques limited number of antennas can be deployed [6]. In cooperative diversity a message transmitted from a source to destination can be overheard by neighbouring nodes. If these neighbouring nodes are willing to cooperative with the source they retransmit the information about the same message to the destination thus enhancing the quality of signal reception. This paper involves with cooperative diversity as a optimal solution for combating fading in FSO links. In this work we consider a decodeand-forward strategy with one relay over FSO links with intensity modulation and direct detection (IM/DD). The cooperative diversity technique is cost effective because it does not require adding more apertures to the transmitter and/or receiver whereas MIMO FSO technique requires more aperture to the transmitter and/or receiver. The performance of MIMO system is degraded due to channel correlation in both FSO and RF scenarios. However, MIMO FSO channels are more likely to be correlated. In fact, in RF systems the signal reaches the
Page 184 Organized by: Department of Computer Science and Engineering , Anand Institute of Higher Technology, Chennai. www.iconic12.in E-mail: iconic.aiht@gmail.com
receiver by a large number of paths implying that a small separation between the antennas can ensure a channel independence for example, the presence of a small cloud between the transmitter and receiver can induce large fades on all source-detector pairs simultaneously. Consequently, the high performance gains promised by MIMO-FSO solutions under the assumption of channel independence might not be achieved in practice. In this context, cooperation can constitute a good candidate solution. Fig. 2. The proposed cooperation
scheme
Fig. 1. network
An example of mesh FSO
II. SYSTEM MODEL AND TRANSMITTER STRUCTURE FSO Metropolitan Area Network is considered as a example shown in Fig. 1. Consider three neighbouring buildings (1), (2), (3) and assume that a FSO connection is available between each building and its two neighbouring buildings. Each one of these connections is established via FSO-based wireless units in this network. Each consisting of an optical transceiver with a transmitter and receiver to provide full duplex capability. One separate transceiver is directly dedicated for the communication with each neighbouring building. We assume that the transceivers on the building(2) are available for cooperation to enhance the communication reliability between buildings (1),and (3). By abuse of notations buildings (1), (2) and(3) will be denoted by source(s), relay (R) and destination (D), respectively.
The cooperation strategy is shown in Fig. 2. The cooperation strategy is as follows: a sequence of symbols is first transmitted to the relay. At second time (R) transmits the decoded symbols to (D) while (s) transmits the same symbol sequence simultaneously to (D). Since three transmissions are involved in each cooperation cycle then the transmitted power from transceivers TRxS,1 , TRxS,2 and TRxR,2 must be divided by 3. Denote by 0, 1 and 2 the random path gains between S-D, S-R and R-D, respectively. In this work, we adopt the lognormal and Rayleigh turbulence-induced fading channel models. In the lognormal model, the probability density function (pdf) of the path gain ( > 0) is given by ) ) the parameters and satisfy the relation = so that the mean path intensity is unity E[I]=E[ The degree of fading is measured by the scintillation index. Typical values of S.I. range between 0.4 and 1. In the Rayleigh model, the pdf of the path gain ( a>0) is : . Consider -ary pulse position modulation (PPM) with IM/DD links where the receiver corresponds to a photoelectrons counter. Consider first the link S-D and denote by Z
the -dimensional vector whose -th component Zq(0) corresponds to the number of photoelectron counts in the -th slot. Denote the transmitted symbol by The decision variable Zs(0) can be modeled as a Poisson random variable (r.v.)with parameter while (with q s ) can be modeled as a Poisson r.v. with parameter
energy at (R) corresponding to E (that corresponds to the S-D link) 1Es . Performing a typical link budget analysis shows that 1=(dSD/dSR) where dSD and dSR stand for the distances from (S) to (D) and (S) to (R) respectively. The maximum-likelihood (ML) decision rule at (R) is given by:. s = arg maxq=1Q .. The relay transmits the symbol along the link R-D implying that the corresponding decision vector can be written as where is r.v with: a poisson
where (resp. ) corresponds to the average number of photoelectrons per slot due to the light signal (resp. background radiation and dark currents):
where is the detectors quantum efficiency assumed to be equal to 1 and h is Plancks constant and f is the optical center frequency taken to be 1.941014 Hz (corresponding to a wavelength of 1550 nm). In the equation (3) Ts stands for the symbol duration, Pr stands for the optical power that is incident on the receiver and corresponds to the incident background power. Finally, Es = PrTs/Q corresponds to the received optical energy per symbol corresponding to the direct link S-D. In the same way, we denote the decision vector corresponding to the S-R link by where the parameter of the Poisson r.v. (1) is given by:
where 2 = (dSD/dSR) with dsD corresponding to the distance between (R) and (D). Finally , note that the normalization of s by 3 in all the above equations ensures that the total transmit power is the same as in noncooperative systems. III. RECEIVER S TRUCTURE As in all cooperative schemes, decoding will be based on the assumption: Pr( = ) 1. A. Detection in the Absence of Background Radiation In the absence of background radiation, (0) and (2) contain at least 1 empty slots each. In this case, the detection procedure at (D) is as follows. If one component of (0) is different from zero, this will imply that the symbol S was transmitted in the corresponding slot since in the absence of background radiation the only source of this nonzero count is the presence of a light signal in this slot. On the other hand, if all components of (0) are equal to zero, then the decision will be based on (2) . If one component of (2) is different from zero, then with probability 1 this component corresponds to S and with probability this component corresponds to an erroneous slot. Since 1 is assumed to be greater than (since 1), then the best
(5) where 1 is a gain factor that follows from the fact that (S) might be closer to (R) than it is to (D). In other words, the received optical
strategy is to decide in favor of the nonempty slot of (2). Finally, if all components of (0) and (2) are equal to zero, then (D) decides randomly in favor of one of the slots B. Detection in the Presence of Background Radiation In this case, the background radiation results in nonzero counts even in empty slots necessitating a more complicated detection procedure. The optimal ML detection procedure must take into consideration that S might be different from S. Note that S = S with probability 1 while S can correspond to a certain slot that is different from S with probability 1 . We can build a simpler decoder that is based on the assumption that decision made at the relay is correct (S = S). Finally cooperation strategy can be implemented without requiring any channel state information at the transceiver. IV.PERFORMANCE ANALYSIS Because of the symmetry of the PPM constellation, we evaluate the error performance of the proposed scheme assuming that the symbol s=1 was transmitted. A. No Background Radiation The symbol-error conditioned on the channel state probability is : (SEP)
of is non zero , then (D) will also decide in favour of S resulting in a error. The SEP can be determined from:
Where because of the form of ,the above integral can be split into three separate integrals. For lognormal fading, eq.(11) does not admit a closed form solution and it can be written as:
and where Fr (a,0,b) is the
lognormal density frustration function defined in as of the poisson r.v
Equation (12) shows that is large when either the links S-D and R-D are both in deep fades or when the links S-D and S-R are both in deep fades thus reflecting the enhanced diversity order of the proposed cooperative system. B. With Background Radiation In the presence of background radiation, the conditional probability of error is given by: . bounded ( =1-[Pr ); q can be upper by
(8) Where p0= 0 no error is made when there is at least one photoelectron in the first slot. P1=(Q-1)/Q Since when a random decision is made at (R) and Z1>0 correct decision will be made at (D). among the Q slot p2=0. P3=1 since when a correct decision is made at (R) (with probability p e) and the count in the corresponding S-th slot
To reduce the complexity of eq. (11) and (12) , eg.(9) must be evaluated numerically.
lognormal model corresponds to the scenario of less severe fading. VI. CONCLSION Cooperative diversity technique can result in significant performance gain . It was proven analytically that a full transmit diversity order can be achieved in the no background radiation case. In the presence of background radiation, the proposed scheme can maintain acceptable performance gains especially in the case of Rayleigh fading. The future work must consider the implementation of of cooperative on higher layers. REFERENCES [1] D. Kedar and S. Arnon, Urban optical wireless communications networks: the main challenges and possible solutions," IEEE Commun. Mag., vol. 42, no. 5, pp. 2-7, Feb. 2003. [2] Zhu and J. Kahn, Free-space optical communication through atmospheric turbulence channels," IEEE Trans. Commun., vol. 50, no. 8, pp1293-1300, Aug. 2002. [3] M.-A. Khalighi, N. Schwartz, N. Aitamer, and S. Bourennane, Fading reduction by aperture averaging and spatial diversity in optical wireless systems," IEEE J. Optical Commun. Netw., vol. 1, pp. 580-593, Nov. 2009. [4] S. G. Wilson, M. Brandt-Pearce, Q. Cao, and J. H. Leveque, Free-space optical MIMO transmission with Q-ary PPM," IEEE Trans. Communication vol. 53, pp. 1402-1412, Aug. 2005. .[5] S. Navidpour, M. Uysal, and M. Kavehrad, BER performance of free- space optical transmission with spatial diversity," IEEE Trans. Wireless Commun., vol. 6, no. 8, pp. 2813-2819, Aug. 2007. [6] J. Laneman and G. Wornell, Distributed space time coded protocols for exploiting cooperative divesrity in wireless networks," IEEE Trans. Inf. Theory, vol. 49, no. 10, pp. 2415-2425, Oct. 2003.
Fig .3 Performance of 4-PPM for Rayleigh fading with no background radiation
Fig .4 Performance of 4-PPM for Lognormal fading with no background radiation . NUMERICAL RESULTS Fig. 3 shows the performance of 4 PPM in the absence of background radiation over Rayleigh fading channels. The slopes of the SEP curves indicate that cooperation results in an increased diversity order of two. Fig.4 where a similar simulation setup is adopted in the case of lognormal fading with S.I. = 0.6. Results in fig. 3 and fig. 4 show that cooperation is more beneficial in the case of Rayleigh fading compared to lognormal fading where the performance gain can be realized at smaller error rates. This result is expected since the Rayleigh distribution is used to model the scenario of severe fading while the
EFFECTIVE CLUSTERING OF WEB OPINIONS FOR SOCIAL NETWORKING SITE USING SCALABLE ALGORITHM
Mahalakshmi Dept of Computer Science and Engg,Easwari Engineering College, EEC Chennai, India
Abstract
The advancement of web technologies, a large volume of Web opinions is available on social media sites such as Web forums and Weblogs. We investigated the density-based clustering algorithm and proposed the scalable distance- based clustering technique for Web opinion clustering. This Web opinion clustering technique enables the identification of themes within discussions in Web social networks and their development, as well as the interactions of active participants. To describe the social network in online forum of any educational institution or any university are of using Bulletin Board System, social network analysis and data mining method was used to investigate the network relationship of community under the help of some softwares and algorithms. In the proposed paper instead of using forum separately we were going to implement the forum in the social network site for making the students more positive and to learn more knowledges and making the users to give their own opinions in the social network site itself. Using these as review (i.e. web opinion) we going to cluster the web opinions based on scalable algorithm. Keywords- Social network analysis, web opinion mining, online forum(BBS) I. INTRODUCTION access digital content. The word opinion means the personal view or comment. Web opinion means the views or comments about something that are expressed through the web i.e. internet. Due to advancement of web 2.0, anyone can give opinions of their own about something and share it with others through the web. Simply the web opinions are the user reviews or comments. The Internet facilitates communication between people not limited to geographical boundaries. For example, users interact with each other in a Web forum when they have a common interest. A Web forum is a virtual platform for expressing personal and communal opinions, comments, experiences, thoughts, and sentiments in discussion threads. There, Web users are able to share their personal details to a circle of friends, amplify their voices and sentiment, establish online communication in a topic of interest, and promote an ideology. The continuous user interaction on web forum and web logs becomes a virtual
A Social Network is a social structure made up of individuals or organizations called nodes, which are tied by one or more specific types of interdependency, such as friendship, kinship, common interest, financial exchange, dislike, relationships of beliefs, knowledge or prestige. We define social network sites as web -based services that allow individuals to construct a public or semipublic profile within a bounded system, articulate a list of other users with whom they share a connection, and view and traverse their list of connections and those made by others within the system. Social media is a form of mass media, and it can be used for interactive, informational, educational or promotional purposes. It can take many forms, including Internet forums, blogs, wikis, podcasts, photograph or picture sharing, video rating and social bookmarking. Social media is relatively economical and accessible for individuals who wish to publish or
communities for members to share thoughts on subjects of their interest without face-to-face contact with each other. The messages in a web forum typically do not have strong factual content. Web forum members express their opinions virtually on all kinds of topics such as political and social issues, education standards, religion, entertainment, movies, music, travelling experiences, consumer products, sports, health and technology. II. RELATED WORKS The web forums are the one where the users can give their own opinions about anything. These user opinions are called web opinions. The web opinions are generally short and noise. Users may use the different words to express the same things. Clustering these web opinions are the major problems. The macro accuracy and micro accuracy of clustering the web opinions using document clustering and density based clustering results in low performance. Due to the property possess by web opinions these methodologies produces less efficiency in clustering them. To performing the content clustering on the forum is of two -fold. First is to identify and group similar threads together and hence to abstract the topics or themes from all clusters. The overall clustering result is to provide a high level content summarization of the threads in forums. The second value is to unveil the ideological topic similarity between forum participants who may or may not have direct interaction. From the perspective of forum participants, it may be useful for them to identify other participants whom they have never interacted with but share with similar ideologies. From the perspective of online community analysts, it may be useful to examine the possibility of some participants bearing multiple screen names and participating in multiple threads across different forums. The objective of content clustering in forum discussions is to cluster similar threads without any predefined cardinality and at the same time without forcing any rare topic or noisy threads to be clustered. This process is somewhat different from hierarchical or partition-based document clustering, where each
document is assigned into at least one cluster. In terms of content clustering, a thread with a unique or rare topic will not be able to form a cluster or be assigned into any cluster. Some threads may remain unique over time, whereas some may become the leads to form new clusters. A. DBSCAN DBSCAN is a density based cluster algorithm that can discover the clusters and filter the noise into a spatial database. DBSCAN stands for Density Based Spatial Clustering of Applications with Noise. A typical density of points within each cluster is considerably higher than outside of the cluster. The density within each cluster region is lower than the density in any of the clusters. The two parameters for the DBSCAN is boundary or parameter called eps-neighborhood of a point and the minimum number of points. DBSCAN algorithm makes use of the concept of density-reachability to extend a particular cluster. The process of forming clusters and finding density-reachable points repeats until all points in the data set are examined. Because the process of expanding a cluster or merging clusters in DBSCAN relies on the density-reachability mechanism, some of the resulting clusters may be in nonconvex or elongated shape. The complexity of DBSCAN is O(N), which is ideal for clustering Web opinions as the number of Web opinions grows exponentially. B. SNN SNN stands for Shared nearest neighborhood. SNN is an enhanced density based clustering algorithm. The major difference between SNN and DBSCAN is the definition of similarity between pairs of points. SNN defines the similarity between a pair of points as the number of nearest neighbors the two points share. The density is measured by the sum of similarities of the K nearest neighbors of a point. Points with high density are selected as core points and points with low density are identified as noise and removed. All remaining points that have a similarity higher than a threshold to a core point are clustered together. The complexity of SNN is
O(N2). Although SNN may achieve higher performance, it is not preferred in clustering web opinions due to the fast growing number of web opinions in social media. C. CORE CLUSTERING CONCEPT SELECTION FOR
Alert Agent
t Space
User Interface Domains and Application Figure 1 System Architecture The system architecture consists of three major components. They are 1. 2. Web forum Discovery and collection Web forum analysis interactive
The opinions found in the web forum are relatively noisy because the content usually comprises of non-edited and conversation material. In order to deal with these noisy datas, we have defined three criteria for selecting top-N core concepts to represent each thread for the purpose of document clustering. The first criteria is to select a certain number of top ranked terms based on term frequencyinverse document frequency computation to form a document vector for each thread. The second criteria is to exclude terms that do not contribute to the comparison process, which computes the similarity score between a pair of document vectors. The third criteria is to use bigrams or two-word terms as part of the document vectors. D. SYSTEM MODEL System architecture is the conceptual model that defines the structure and/or behavior of the system. It provides a way in which products can be procured, systems can be developed an architectural overview of the overall system. The system architecture for the system is given in Figure 1 Web forum Analysis
3. User Interface and information visualization.
b Forum Educat ional Foru m Monit oring &
In the first component, i.e., Web forum discover and collection, a monitoring agent monitors a forum, and a crawler fetches messages in the forum according to the hyperlink structure. The collected messages are analyzed with the emphasis on these three dimensions: member identity, timestamp of messages, and structure of threads. In the second component, i.e., Web forum content and link analysis, we utilize machine learning and social network analysis techniques to extract useful knowledge. In the third component, i.e., user interface and interactive information visualization, we provide a user interface for users to submit their queries and present results through interactive visualization techniques for users to explore the forum social networks and content. III. PROPOSED SYSTEM
People Space Contribute influ ence Crawler Conten
The algorithm used is scalable algorithm. This algorithm ensures that a required density must be reached in the initial clusters and uses the scalable distances to expand the initial clusters. The scalable
algorithm doesnot require a predefined number of clusters and also it filter out the noises in the clusters. First the initial clusters, small clusters with very high density that meet the requirement of initial density is identified. Then by scaling up the size of a cluster iteration by iteration, until the cluster grows upto a extent by which it cannot be further enlarged. Points that are directly density reachable are not necessarily included in the clusters because they are still further away from the expanded clusters. By scaling the distance we ensure that points within a cluster are close to one another with a reasonable distance but not to a few points that are directly density reachable. The complexity of the scalable algorithm is O(N). Scalable Algorithm: Initialize all points as unclassified points, 1. S={ p1, p2,, pn} 2. Repeat 3. Randomly select a point pi in S as a seed If the number of points in eps-neighborhood 4. of piMinPts Create an initial cluster Cj by including the 5. seed and all its eps-neighborhood points 6. S = S - Cj 7. Else pi is classified as X 8. S=Spi 9. Until S = 10. For each initial cluster cj 11. Repeat 12. Find the centroid 13. Eps = eps - eps Add points from X in which the distance 14. from the centroid of the cluster is larger than eps 15. Until no other points are found The points remaining in X are 16. considered as noise. experiment that investigated the effectiveness An of the SDC algorithm in clustering topics in Web forum and analyzed how the parameters of the eps-neighborhood of a point (eps) and the minimum number of points
required for being a neighborhood ( MinPts) affect the performance. Both eps and MinPts are the important parameters determining the density for clustering. The microaccuracy and macroaccuracy are used as the metrics to measure the performance of SDC and benchmark with the performance of DBSCAN. Microaccuracy measures the overall average clustering accuracy, whereas macroaccuracy measures the average of the clustering accuracy of all clusters
where |C| is the number of clusters created, |Hi| is the number of threads that is correctly classified in the cluster Ci, |Ci| is the number of threads in the cluster Ci and |Ci| is greater than one, and N is the total number of threads. By using SDC, Web opinions that have similar content are clustered and identified as a theme of discussions.Web opinions that are not similar to others are considered noise because they do not have sufficient participants to contribute their opinions in particular topics. An important theme usually draws attention from many participants, and many Web opinions on this theme will be created. SDC provides a good content analysis to extract the major themes. By analyzing the social network of the extracted themes, we can further investigate the leading and active participants in a theme of discussion. IV. PERFORMANCE EVALUATION
We conducted an experiment that investigated the effectiveness of the SDC algorithm in clustering topics in Web forum and analyzed how the
parameters of the eps-neighborhood of a point (eps) and the minimum number of points required for being a neighborhood (MinPts) affect the performance. Both eps and MinPts are the important parameters determining the density for clustering. The microaccuracy and macroaccuracy are used as the metrics to measure the performance of SDC and benchmark with the performance of DBSCAN. Microaccuracy measures the overall average clustering accuracy, whereas macroaccuracy measures the average of the clustering accuracy of all clusters. DBSCAN and SDC do not require specifying the number of clusters to be formed. As a result, changing eps and MinPts will also affect the number of clusters being generated in addition to accuracy. We first investigate the effect of MinPts on microaccuracy and macroaccuracy of DBSCAN by setting eps as 0.17 and 0.19. Figure 2 and 3 show the microaccuracy and macroaccuracy with MinPts = 2 to Figure 4 Minimum number of points in a cluster MinPts A cluster is considered invalid when a theme cannot be identified from the threads in the cluster. There are some similarities between the threads, but there is not a focus in the discussion between them. A cluster is considered valid if a theme is identified and only some threads are considered noise. The total number of threads in all clusters also decreases significantly for each increment of MinPts as shown in Fig. 5. As shown in Figure 6, the microaccuracy increases as the total number of clusters decreases until it reaches the optimal at 91% and 87% when MinPts = 4 and MinPts = 5, respectively. The microaccuracy decreases as the total number of clusters continues to decrease. However, when we reach the optimal accuracy, we are sacrificing the valid clusters of smaller size. Chennai to give this title social network for research and my Guide Asst. Prof. Mrs. S. Gnanapriya, to give me Page 193 Organized by: Department of Computer Science and Engineering , Anand Institute of Higher Technology, Chennai. www.iconic12.in E-mail: iconic.aiht@gmail.com
Figure 7 shows the plots of the number of valid clusters against microaccuracy. When eps = 0.18, all invalid clusters are removed, and the microaccuracy reaches 87%. This combination reveals a balance between the number of valid clusters and microaccuracy.
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) support and patient enough to make me understand the complexities in this research and relentlessly supported It is found that SDC performs consistently better than DBSCAN in a wide range of eps no matter whether a higher purity of clusters is obtained when a higher eps performances of DBSCAN and SDC is significant at 0.05 level. Such finding ensures that a scalable distance-based approach is more suitable than a pure density-based approach for Web opinion clustering. Although SDC achieves good performance in clustering Web opinions, it has its own limitations. SDC does not require a predefined number of clusters, but it has two parameters eps and MinPts as inputs. eps and MinPts are important in identifying the initial clusters. A systematic tuning of these two parameters is needed to achieve optimal performance. These parameters have impacts on micro - and macroaccuracy as well as the number of identified clusters. The tuning can be adjusted to achieve the performance objectives. V. CONCLUSION me till now. value is used or a larger number of clusters are created when a lower eps value is used. A t-test shows that the difference between the Quotations, IEEE/ WIC/ ACM International Conference on Web Intelligence and Intelligent Agent Technology Workshops, Vol. 3, pp. 523-526. [2] Anna Huang, (2008), Clustering document with active learning using Wikipedia, IEEE International Conference o n Data Mining, pp. 839-844. Anna Stavrianou , Julien Velcin , Jean-Hugues Chauchat, (2008), A combination of opinion mining and social network techniques for discussion analysis, pp. 68-74. Carolin Kaiser, Johannes Krockel, Freimut Bodendorf, (2010), Ant-Based Simulation of Opinion Spreading in Online Social Networks, IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, Vol. 1, pp. 537540. Carolin Kaiser, Johannes Krockel, Freimut Bodendorf, (2011), Analyzing Opinion Formation in Online Social Networks Mining Services for Online Market Research, Annual SRII Global Conference, pp. 384-391. Chih-Ping Wei, Yu-Hsiu Chang, (2007), Discovering Event Evolution Patterns From Document Sequences, IEEE transactions on systems, man, and cybernetics part a: systems and humans, Vol. 37, NO. 2, pp. 273283. [7] Christopher C. Yang, Tobun D. Ng, (2007), Terrorism and Crime Related Weblog Social Network: Link, Content Analysis and Information Visualization, pp. 55-58.
[3]
[4]
In the social networking site, the web forum is embedded in order to collect the user opinions. The social networking site is the one where we can have lot of users. So collecting the large amount of review as possible. After collecting those reviews, they are separated and clustered. The resulting clusters are displayed in the social networking site itself for the user convenience. At the same time the accuracy of the clusters are improved. ACKNOWLEDGEMENT I hereby greatly acknowledge thanks to my Head of the Department and Professor Dr. K. Kathiravan, Information Technology, Easwari Engineering College, Chennai to give this title social network for research and my Guide Asst. Prof. Mrs. S. Gnanapriya, to give me support and patient enough to make me understand the complexities in this research and relentlessly supported me till now. REFERENCES [1] Alexandra Balahur, Ralf Steinberger, Erik van der Goot, Bruno Pouliquen, Mijail Kabadjov, (2009), Opinion Mining on Newspaper
[5]
[6]
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) [8] [9] Christopher C. Yang, Nan Liu, (2009), Analyzing the Terrorist Social Networks with Visualization Tools Danushka Bollegala, Yutaka Matsuo, (2011), A Web Search Engine-Based Approach to Measure Semantic Similarity between Words, IEEE transactions on knowledge and data engineering, Vol. 23, pp. 977-990. Grzegorz Dziczkowski, Lamine Bougueroua, (2009), Social Network - An autonomous system designed for radio recommendation, International Conference on Computational Aspects of Social Networks, pp. 57-64. Krzysztof Jdrzejewski, Mikoaj Morzy, (2011), Opinion Mining and Social Networks: a Promising Match International Conference on Advances in Social Networks Analysis and Mining pp. 599-604. Li Wang, Zheng-Ou Wang, (2003) CUBN: A Clustering Algorithm Based on Density and Distance, Proceedings of the Second International Conference on Machine Learning and Cybernetics, Vol. 1, pp. 108-112. Lian Duan, Deyi Xiong, (2006), A Local Density Based Spatial Clustering Algorithm with Noise, IEEE International Conference on Systems, Man, and Cybernetics, Vol. 5, pp. 4061-4066. Lu Zhiqiang, Shao Werimin, (2009), Measuring Semantic Similarity between Words Using Wikipedia, International Conference on Web Information Systems and Mining, pp. 251-255. [15] [16] Ludmila Himmelspach, Stefan Conrad, (2011), Density-Based Clustering using Fuzzy Proximity Relations, IEEE, pp. 1-6. Shenghong Yang, Yongheng Wang, (2009), Density-Based Clustering of Massive Short Messages using Domain Ontology, AsiaPacific Conference on Information Processing, Vol. 2, pp. 505-508. Swagatam Das, Ajith Abraham, (2008), Automatic Clustering Using an Improved Differential Evolution Algorithm, IEEE transactions on systems, man, and cyberneticspart a: systems and humans, Vol. 38, pp. 218-237. V. Suresh Babu and P. Viswanath, (2008), An Efficient and Fast Parzen-Window Density based Clustering Method for Large Data Sets, First International Conference on Emerging Trends in Engineering and Technology, pp. 531-536. Xiyu Liu, Yinghong Ma, (2008), Density Based Clustering on Manifolds with Applications to Creative Design, IEEE, pp. 283-288. Zhiwei SUN, Zheng Zhao, (2005), A Fast Clustering Algorithm Based on Grid and Density, IEEE CCECE/CCGEI, Saskatoon, pp.2276-2279.
[10]
[17]
[11]
[18]
[12]
[19]
[13]
[20]
[14]
Hierarchical Modeling and analysis of Cloud services with Energy Efficiency

1
R.H.Bavya Sugani 1, J.Preethi janet2, D. Arulsuju3 ME(CCE), Dept of CCE, Rajas Engineering College, Thirunelveli. 2 Asst. Prof, Rajas Engineering College, Thirunelveli.
Dept.of CSE,GKM college of Engineering and Technology.Chennai
Abstract
Cloud computing is the new era in the field of computing. It is composed of several layers, all of which can be accessed by users connected to it. Cloud is a model for providing on-demand network access to shared pool of configurable computing resources. They can be rapidly provisioned and released with minimal management effort or service provider interaction. For processing large amount of data, management and switching of communications contribute significantly to energy consumption and cloud computing seems to be an alternative to office-based computing. Network-based cloud computing is rapidly expanding as an alternative to conventional office-based computing. In this research work, I present an analysis of energy consumption in each cloud layers. And implement software as a service, storage as a service and processing as service models reliability in those layers. The analysis considers both public and private clouds, and includes energy consumption in switching and transmission as well as data processing and data storage. I compare that the energy consumption in transport and switching can be a significant percentage of total energy consumption in cloud computing. Interactive cloud system is implemented by voice XML features. KEYWORDS | Cloud computing; core networks; data centers; energy consumption INTRODUCTION The increasing availability of high-speed Internet and corporate IP connections is enabling the delivery of new network-based services [1]. While Internet-based mail services have been operating for many years, service offerings have recently expanded to include network-based storage and Network-based computing. These new services are being offered both to corporate and individual end user [12], [3]. Services of this type have been generically called Bcloud computing[ services [2][7]. The cloud computing service model involves the provision, by a service provider, of large\ pools of high- performance computing resources and high-capacity storage devices that are shared among end users as required [8][10]. There are many cloud service models, but generally, end users subscribing to the service have their data hosted by the service, and have computing resources allocated on demand from the pool. The service providers offering may also extend to the software applications required by the end user. To be successful, the cloud service model also requires a high-speed network to provide connection between the end user and the service providers infrastructure. Cloud computing potentially offers an overall financial benefit, in that end users share a large, centrally managed pool of storage and computing resources, rather than owning and managing their own systems [5]. Often using existing data centers as a basis, cloud service providers invest in the necessary infrastructure and management Cloud Layers-service reliability: There are the four key layers of a cloud environment and the technological skills required to better understand the aspects of cloud computing.
Fig 1: Cloud layers There are many definitions of cloud computing, and discussion within the IT industry continues over the possible services that will be offered in the future [8], [10]. The broad scope of cloud computing is succinctly summarized in [1]: Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable computing resources that can be rapidly provisioned and released with minimal management effort or service provider interaction. Cloud computing architectures can be either public or private [8], [9]. A private cloud is hosted within an enterprise, behind its firewall, and intended only to be used by that enterprise [8]. In such cases, the enterprise invests in and manages its own cloud infrastructure, but gains benefits from pooling a smaller number of centrally maintained high performance computing and storage resources instead of deploying large numbers of lower performance systems. Further benefits flow from the centralized maintenance of software packages, data back- ups, and balancing the volume of user demands across multiple servers or multiple data center sites. In contrast, a public cloud is hosted on the Internet and designed to be used by any user with an Internet connection to provide a similar range of capabilities and services [8]. A number of organizations are already hosting and/or offering cloud computing services. Examples include Google Docs [2], Amazons Elastic Compute Cloud and Simple Storage services [3], Microsofts Windows Azure Platform [4], IBMs Smart Business Services [5], Salesforce.com [6], and Webex [7].
In this paper, we present an overview of energy consumption in cloud computing and compare this to energy consumption in conventional computing. For this comparison, the energy consumption of conventional computing is the energy consumed when the same task is carried out on a standard consumer personal computer (PC) that is connected to the Internet but does not utilize cloud computing. We consider both public and private clouds and include energy consumption in switching and transmission, as well as data processing and data storage. Specifically, we present a network-based model of the switching and transmission network [1], [8], [9], a model of user computing equipment, and a model of the processing and storage functions in data centers [7], [3], [1]. We examine a variety of cloud computing service scenarios in terms of energy efficiency. In essence, our approach is to view cloud computing as an analog of a classical supply chain logistics problem, which considers the energy consumption or cost of processing, storing, and transporting physical items. The difference is that in our case, the items are bits of data. As with classical logistics modeling, our analysis allows a variety of scenarios to be analyzed and optimized according to specified objectives. There are many definitions of cloud computing, and within the IT industry continues over the services that will be offered in the future [8], [10]. he broad scope of cloud computing is succinctly summarized in [7]: Cloud computing is a model for enabling convenient, ondemand network access to a shared pool of configurable computing resources that can be rapidly provisioned and released with minimal management effort or service provider interaction. Cloud computing architectures can be either public or private [8], [9]. A private cloud is hosted within an enterprise, behind its firewall, and intended only to be used by that enterprise [8]. In such cases, the enterprise invests in and manages its own cloud infrastructure, but gains benefits from pooling a smaller number of centrally maintained high performance computing and storage resources instead of deploying large numbers of lower performance systems. Further benefits flow from the centralized maintenance of software packages, data back- ups, and balancing the volume of user demands across multiple servers or multiple data center sites.
In contrast, a public cloud is hosted on the Internet and designed to be used by any user with an Internet connection to provide a similar range of capabilities and services [8]. A number of organizations are already hosting and/or offering cloud computing services. Examples include Google Docs [12], Amazons Elastic Compute Cloud and Simple Storage services [3], Microsofts Windows Azure Platform [4], IBMs Smart Business Services [5], Salesforce.com [6], and Webex [7]. In this paper, we present an overview of energy consumption in cloud computing and compare this to energy consumption in conventional computing. For this comparison, the energy consumption of conventional computing is the energy consumed when the same task is carried out on a standard consumer personal computer (PC) that is connected to the Internet but does not utilize cloud computing. We consider both public and private clouds and include energy consumption in switching and transmission, as well as data processing and data storage. Specifically, we present a network-based model of the switching and transmission network [2], [8], [9], a model of user computing equipment, and a model of the processing and storage functions in data centers [7], [3], [11]. We examine a variety of cloud computing service scenarios in terms of energy efficiency. In essence, our approach is to view cloud computing as an analog of a classical supply chain logistics problem, which considers the energy consumption or cost of processing, storing, and transporting physical items. The difference is that in our case, the items are bits of data. As with classical logistics modeling, our analysis allows a variety of scenarios to be analyzed and optimized according to specified objectives. II. CLOUD SERVICE MODELS We focus our attention on three cloud services storage as a service, processing as a service and software as a service. In the following sections, we outline the functionality of each of the three cloud services. Note that we use the terms Bclient,[Buser,[ and Bcustomer[ interchangeably.
Software as a Service Consumer software is traditionally purchased with a fixed upfront payment for a license and a copy of the software on appropriate media. This software license typically only permits the user to install the software on one computer. When a major update is applied to the software and a new version is released, users are required to make a further payment to use the new version of the software. Users can continue to use an older version, but once a new version of software has been released, support for older versions is often significantly reduced and updates are infrequent. With the ubiquitous availability of broadband Internet, software developers are increasingly moving towards providing software as a service [12] [4], [6]. In this service, clients are charged a monthly or yearly fee for access to the latest version of software [12], [3]. Additionally, the software is hosted in the cloud and all computation is performed in the cloud. The clients PC is only used to transmit commands and receive results. Typically, users are free to use any computer connected to the Internet. However, at any time, only a fixed number of instances of the software are permitted to be running per user. One example of software as a service is Google Docs [12]. B. Storage as a Service Through storage as a service, users can outsource their data storage requirements to the cloud [3][6]. All processing is performed on the users PC, which may have only a solid state drive (e.g., flash-based solid-state storage), and the users primary data storage is in the cloud. Data files may include documents, photographs, or videos. Files stored in the cloud can be accessed from any computer with an Internet connection at any time [5]. How- ever, to make a modification to a file, it must first be downloaded, edited using the users PC and then the modified file uploaded back to the cloud. The cloud service provider ensures there is sufficient free space in the cloud and also manages the backup of data [5]. In addition, after a user uploads a file to the cloud, the user can grant read and/or modification privileges to other users. One example of storage as a service is the Amazon Simple Storage service [3].
C. Processing as a Service Processing as a service provides users with the resources of a powerful server for specific large computational tasks [2][6]. The majority of tasks, which are not computationally demanding, are carried out on the users PC. More demanding computing tasks are uploaded to the cloud, processed in the cloud, and the results are returned to the user [6]. Similar to the storage service, the processing service can be accessed from any computer connected to the Internet. One example of processing as a service is the Amazon Elastic Compute Cloud service [3]. III. MODELS OF ENERGY CONSUMPTION In this section, we describe the functionality and energy consumption of the transport and computing equipment on which current cloud computing services typically operate. We consider energy consumption models of the transport network, the data center, plus a range of customer owned terminals and computers. The models described are based on power consumption measurements and published specifications of representative equipment [7], [1], [2], [3]. Those models include descriptions of the common energy-saving techniques employed by cloud computing service providers. We consider both public and private clouds. For the public cloud, the schematic includes the data center as well as access, metro and edge, and core networks. The private cloud schematic includes the data center as well as a corporate network.. From a hardware perspective, the key difference between public cloud computing and private cloud computing is the network connecting the users to the respective data center. As described earlier, a data center for a public cloud is hosted on the Internet and designed to be used by anyone with an Internet connection. Fig 2: Comparing cloud with traditional computing Data centers in turn connect to the core network through their own gateway router. A typical data center comprises a gateway router, a local area network, servers, and storage [7], [3]. The BNG routers, provider edge routers, and the data center gateway routers typically dual-home to more than one core router, in order to achieve higher service availability through network redundancy. Although only a single data center is shown, a cloud service provider would normally maintain several centers with dedicated transport between these centers for redundancy and efficient load balancing. IV. ANALYSIS OF CLOUD SERVICES In this section, we compare the per-user energy consumption of each cloud service outlined in Section II using the energy model described in Section III. The energy consumption of each cloud service is also compared against the energy consumption of conventional computing. As described earlier, the key difference between public cloud computing and private cloud computing is the transport network connecting users to the data center. Storage as a Service In this section, we analyze the energy consumption of storage as a service. We consider, as an example, a file storage and backup service, where all processing and computation is performed on the users computer but user data are stored in the cloud. Files are downloaded Page 199 Organized by: Department of Computer Science and Engineering , Anand Institute of Higher Technology, Chennai. www.iconic12.in E-mail: iconic.aiht@gmail.com
from the cloud for viewing and editing and then uploaded back to the cloud for storage. Number of Public cloud Private cloud Hosted system 100 200 300 400 500 300 310 290 300 300 240 250 240 250 240
client [computer) that communicates with its server via simple commands transmitted through the Internet. The server in turn transmits video data to the terminal that is output on a monitor. As mentioned in Section III-A, the power consumption of the visual display unit, speakers, and peripheral devices is not included in the model as they would be common to all alternative configurations. All data processing is performed at a remote server. Processing as a Service We model processing as a service with each user having a low-end laptop that is used for routing tasks and compare the energy consumption with the use of a higher capacity desktop machine. In the cloud, there are computation servers that are used for computationally intensive tasks. Data for computationally intensive tasks are uploaded to a cloud service, and the completed output is returned to the user. As an example of a computationally intensive task, we model the task of converting and compressing a video file. We calculate the per-week energy consumption of the processing service as a function of the number of encodings per week N. The per-user energy consumption (watt hours) Eps of the processing service, including the users PC, is
Fig 3: Cloud Analyst simulation Results The number of times a file is downloaded per hour would depend on the nature of the file. A word processing document or spreadsheet might be required a few times per hour, but photograph downloads might take place many times per hour.
600
Number of Host Machines Connected
500 400 300 200 100 0 0 100 200 Enery Consumption 300 400 Private Cloud Public Cloud
Fig 4: Energy consumption-Private and public cloud Software as a Service Users access a software service (sometimes referred to as virtual desktop infrastructure) via a terminal (B dumb Fig 5: Cloud simulation-comparing public and private cloud
V.DISCUSSION The level of utilization achieved by a cloud service is a function of the type of services it provides, the number of users it serves, and the usage patterns of those users. Large-scale public clouds that serve a very large number of users are expected to be able to fully benefit from achieving high levels of utilization and high levels of virtualization, leading to low per-user energy consumption. Private clouds that serve a relatively small number of users may not have sufficient scale to fully benefit from the same energy-saving techniques. Our analysis is based on the view that cloud computing fully utilizes servers and storage for both public and private clouds. The results of our analysis indicate that private cloud computing is more energy efficient than public cloud computing due to the energy savings in transport. However, it is not clear whether in general the energy consumption saving in transport with a private cloud offsets the higher energy consumption due to lower utilization of servers and storage. The logical unification of several geographically diverse data centers assists cloud computing to scale during periods of high demand. However, energy-efficient transport between these data centers is necessary to ensure that cloud computing is energy efficient. In our analysis, public clouds consumed more energy than private clouds because users connected to the public cloud through the public Internet. Optical bypass can be used to reduce the number of router hops through the network [8], [5] and thus the energy consumption in transport [12]. To minimize the energy consumption in transport, cloud computing data centers should be connected through dedicated point-to-point links incorporating optical bypass where possible. Indeed, reducing the number of routings hops and transmission links would yield benefits to all services. VI. CONCLUSION In this paper, we presented a comprehensive energy consumption analysis of cloud computing. The analysis considered both public and private clouds and included energy consumption in switching and transmission as well as data processing and data storage. We have
evaluated the energy consumption associated with three cloud computing services, namely storage as a service, software as a service, and processing as a service. Any future service is likely to include some combination of each of these service models. Power consumption in transport represents a signify cant proportion of total power consumption for cloud storage services at medium and high usage rates. For typical networks used to deliver cloud services today, public cloud storage can consume of the order of three to four times more power than private cloud storage due to the increased energy consumption in transport. Nevertheless, private and public cloud storage services are more energy efficient than storage on local hard disk drives when files are only occasionally accessed. However, as the number of file downloads per hour increases, the energy consumption in transport grows and storage as a service consumes more power than storage on local hard disk drives. The energy savings from cloud storage are minimal. VII.REFERENCES [1] Jayant Baliga, Robert W. A. Ayre, Kerry Hinton, and Rodney S. Tucker, Fellow IEEE: Green Cloud Computing: Balancing Energy in Processing, Storage, and Transport Vol. 0018-9219 IEEE January 2011 | Proceedings of the IEEE. [2] Alex Vukovic, Ph.D., P.Eng: Data centers: Network power density challenges, Vol. 47& page number 55-59 IEEE April 2005. [3] Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber: Bigtable: A Distributed Storage System for Structured Data, Vol. pp. 205-218 IEEE August 2006. [4] J. Baliga, K. Hinton, and R. S. Tucker, Energy consumption of the Internet, in Conf. Opt. Internet/Australian Conf. Opt. Fibre Technol., Melbourne, Vic., Australia, Jun. 2007.
[5] Andreas Berl,, Erol Gelenbe, Marco di Girolamo, Giovanni Giuliani, Hermann de Meer, Minh Quan Dang and Kostas Pentikousis: Energy Efficient Cloud Computing, Vol. doi:10.1093/comjnl/bxp080 Page Number : 1-7 IEEE August 2009. [6] Shekhar Srikantaiah,Aman Kansal and Feng Zhao: Energy Aware Consolidation for Cloud Computing, Vol, 06VNCJAEA02 Page 1-5 IEEE December 2008. [7] D. Colarelli and D. Grunwald, Massive arrays of idle disks for storage archives, in Proc. ACM/IEEE Conf. Supercomput., Los Alamitos, CA, 2002, DOI: 10.1109/SC.2002.10058. [8] Jeffrey S. Chase, Darrell C. Anderson, Prachi N. Thakar, Amin M. Vahdat: Managing Energy and Server Resources in Hosting Centers, Vol. 103-116 IEEE June 2001.
[9] Bhathiya Wickremasinghe: Cloud Analyst: A CloudSim-based Tool for Modelling and Analysis of Large Scale Cloud Computing Environments, Vol. 1-44 IEEE June 2009. [10] Bhathiya Wickremasinghe1, Rodrigo N. Calheiros2, and Rajkumar Buyya1: CloudAnalyst: A CloudSimbased Visual Modeller for Analysing Cloud Computing Environments and Applications. Vol. JA2001/CA7 IEEE June 2001. [11] James A. Larson: W3C Speech Interface Languages: VoiceXML, Vol 126-130 IEEE May 2007. [12] Cloud Computing. [Online]. Available: http://en.wikipedia.org/wiki/Cloud_computing.
ADAPTIVE FILTER ALGORITHM TO CANCEL RESIDUAL ECHO FOR SPEECH ENHANCEMENT

1
GAYATHRI P1 PG student, Electronics & Communication Engineering Department,Jaya engineering college, Thiruninravur,Chennai.
Abstract This paper examines the technique of using a noise-suppressing nonlinearity in the adaptive filter error feedback-loop of an acoustic echo canceller (AEC) based on the normalized least mean square (NLMS) algorithm when there is an interference at the near end in addition with the Error Recovery Nonlinearity (ERN) technique which enhances the filter estimation error prior to the adaptation in order to assist the linear adaptation process. Adaptive filtering techniques as they apply to acoustic echo cancellation. This has been simulated with the help of adaptive filtering algorithms as a real time acoustic echo cancellation system. This paper deals with the background signal processing theory as it pertains to the application of adaptive filters, studies the basis of adaptive filtering techniques and the simulations of adaptive filtering techniques as developed in Visual C++. This shows the results of these simulations as well as discussing the advantages and disadvantages of this technique and the real time implementation of an adaptive echo cancellation. The error enhancement paradigm encompasses many traditional signal enhancement techniques and opens up an entirely new avenue for solving the AEC problem in a real-world setting. Our proposed approach in this paper brings seemingly separate signal enhancement techniques under one roof and ties them together in an analytically consistent and practical manner. Index Terms -- Acoustic echo cancellation, error enhancement, error recovery nonlinearity. I. INTRODUCTION CURRENTLY, the most accurate method for evaluating speech quality is through subjective listening tests. Although subjective evaluation of speech enhancement algorithms is often accurate and reliable (i.e., repeatable) provided it is performed under stringiest conditions (e.g., sizeable listener panel, inclusion of anchor conditions, etc.), it is costly and time consuming. For that reason, much effort has been placed on developing objective measures that would predict speech quality with high correlation. In telephony system, the received signal by the loudspeaker, is reverberated through the environment and picked up by the microphone. It is called an echo signal. This is in the form of time delayed and attenuated image of original speech signal, and causes a reduction in the quality of the communication. Adaptive filters are a class of filters that iteratively alter their parameters in order to minimize a difference between desired outputs and their output. In the case of acoustic echo, the optimal output is an echoed signal that accurately emulates the unwanted echo signal. This is then used to negate the echo in the return signal. The better the adaptive filter simulates this echo, the more successful the cancellation will be. The LMS algorithm was devised by Widrow and Hoff in 1959 in their study of a pattern-recognition machine known as the adaptive linear element, commonly referred to as the Adeline The LMS algorithm is a stochastic gradient algorithm in that it iterates each tap weight of the transversal filter in the direction of the instantaneous gradient of the squared error signal with respect to the tap weight in question. The least mean square (LMS) algorithm (1) w( n 1) w( n ) e( n ) x ( n ) is a preferred choice for acoustic echo cancellation (AEC) due to its computational simplicity and adaptability, where x(n) = [x(n), x(n 1), , x(n L + 1)] T is the reference signal vector of length L at time n, w(n) = w 0 (n),w 1 (n), ,w L1 (n)] T is the filter coefficient vector of the same length,T is the transposition operator, is the adaptation step-size parameter, and
( n) e( n ) d ( n ) d
(2)
is the error from approximating the observed acoustic echo d(n) by the replica (3) based on the estimated echo return path response reflected in w(n). The adaptation rule of (1) is actually a stochastic version of the LMS gradient descent adaptation rule w(n + 1) = w(n) wE{|e(n)| 2 } (4) Where the direction and the amount of adaptive correction to w(n) depend on the sample estimate of the gradient w E{|e(n)|2} = 2E{e(n)x(n)} 2e(n)x(n) (5) The error is thus modeled to be quadratic in the space of the parameter w(n), and the adaptation rule is based on the second-order statistics (SOS). The LMS algorithm has an inherent ability to eventually converge (with an arbitrary degree of precision) to a unique and optimal solution that minimizes the mean Square error (MSE) E{|e(n)|2} and consequently satisfies the orthogonality principle E{e(n)x(n)} = 0., many modifications have been proposed to make the LMS algorithm more robust to deviations from ideal conditions necessary for a perfect system identification. The LMS algorithm suffers from slow convergence when the reference signal is colored, e.g., speech signals. The convergence rate can be improved through decorrelation across time, e.g., by using a decorrelation filter or across frequency, e.g., by using the frequencyblock LMS (FBLMS) algorithm. Further, signals observed in reality are rarely stationary, calling for algorithms like the normalized LMS (NLMS) algorithm and is summarized by the LMS update equation: (6) w( n 1) w( n ) f (e ( n )) x ( n ) The NLMS algorithm is far less sensitive to changes in the reference signal magnitude than the LMS algorithm, and the convergence, kw( n 1) w( n ) where k 0 as n , is guaranteed for 0 < < 2. II. ERROR RECOVERY NONLINEARITY 2.1Introduction The standard usage of a voice activity detector to estimate the noise variance during silence for the error enhancement has already been tested in. Other advanced techniques, may be employed for improved tracking
(n) wT (n) x(n) d
performance. However, the conventional single-channel techniques tend to be heuristic in nature and depend strongly on the arbitrarily chosen thresholds and smoothing constants. In order to clearly demonstrate the behaviors of different combinations of the ERN, the given SNR (averaged over time for entire data) will be used throughout the timedomain simulation. The main purpose is to show that the sensitivity to the choice of fixed parameters is reduced by the system approach and that the precisely estimated signal statistics are not necessary through the block-iterative adaptation procedure, which can be carried out efficiently in the frequency domain. A simulated acoustic echo, generated from a room impulse response with the reverberation time of about 250 ms measured at 8 kHz sampling rate was used. The impulse response was truncated to match the length of the adaptive filter, chosen here to be 64 ms, to force the filter-length mismatch error to be zero (again for the purpose of illustrating the behavior of the ERNs more clearly) and scaled to produce the echo return loss (i.e., attenuation) of 10 dB. 16 kHz, 16-bit PCM speeches from the TIMIT database down sampled to 8 kHz were used. The double-talk situation can create large variations in the filter estimation error, especially during the frequency-domain AEC when a speech signal is very sparsely represented across frequency. Thus the proper specification of the overall form of the ERN, dictated by the a priori source information, is practically more relevant than that of the exact mathematical form. The ERN can also be viewed as a function that controls the step-size for sub-optimal conditions reflected in the error statistics when the signals are no longer Gaussian distributed, as most often is the case in reality, and it may be combined with other existing noise-robust schemes to improve the overall performance of the LMS algorithm. A built-in noiserobust mechanism, continuous adaptation is permitted during double talk by a proper combination of the ERN and the LMS algorithm. 2.2 Bussang Technique for Unsupervised Adaptation Residual echo enhancement closely follows the Bussang technique for blind equalization or deconvolution: w(n + 1) = w(n) + (z(n) (z(n)))y(n) (7)
where y(n) is a vector of the observed (convoluted) signal y(n) = h(n) * x(n), z(n) = w(n) *y(n) wT (n) y(n) is the filtered deconvoluted) output, and () is a memory less nonlinearity such that (z(n)) = x(n) is an estimator of the original, unobservable signal x(n). The system equalization without any training signal is made possible through the iterative application of a linear adaptive filter, which estimates the inverse filter w(n), followed by the nonlinearity, which enhances the filtered signal z(n). Specifically, a decomposition of z(n) reveals that
z ( n ) w( n ) h ( n ) x ( n )
x(n) h (n) h(n) x(n) w(n) h(n) x(n) (8) z (n) x(n) v (n)
where h1(n) is the inverse impulse response, i.e., h1(n)*h(n) = 1, and v(n) = (w(n) h1(n)) *h(n)*x(n) is the so-called convolution noise. Hence by reducing the effect of v(n) in z(n) through the application of the nonlinearity (), the estimation error e(n) = z(n) (z(n)) = z(n) x(n) necessary for the filter adaptation can be obtained blindly. III IMPLEMENTATION ASPECTS 3.1 Introduction The acoustic echo canceller has been implemented in real time on coding by visual C++.There is two observations are made before started the implementation. First the full band filtering must be performed in real-time i.e. on sample by sample basis. This means that for each new sample a FIR filtering of the full-band filter must be done. It is preferred that the adaptive filtering also was performed at each sampling instance but this was not possible due to the amount of calculations. Instead the loudspeaker signal x(n) and the microphone signal d(n) have been buffered . These buffered signals have been used to calculate the weights. New buffers are filled at the same time this means that once the filter coefficient calculation is done on the old buffer. A new calculation is started in order to keep the adaptation as fast as possible. In order to use the error signal e(n) instead. The adaptation must be run on a sample by s ample basis and this will require either shorter filters or a faster DSP. 3.2 Normalized Least Mean Square (NLMS) algorithm
For each iteration the LMS algorithm requires 2N additions and 2N+1 multiplications. One of the primary disadvantages of the LMS algorithm is having a fixed step size parameter for every iteration. In practice this is rarely achievable. Even if we assume the only signal to be input to the adaptive echo cancellation system is speech, there are still many factors such as signal input power and amplitude which will affect its performance. The normalized least mean square algorithm (NLMS) is an extension of the LMS algorithm which bypasses this issue by calculating maximum step size value. Step size value is calculated by using the following formula. Step size=1/dot product (input vector, input vector) This step size is proportional to the inverse of the total expected energy of the instantaneous values of the coefficients of the input vector x(n). This sum of the expected energies of the input samples is also equivalent to the dot product of the input vector with itself, The recursion formula for the NLMS algorithm is stated in equation 6.
w(n 1) w(n)
3.1. Implementation of the NLMS algorithm: The NLMS algorithm has been implemented in Visual C++. As the step size parameter is chosen based on the current input values, the NLMS algorithm shows far greater stability with unknown signals. This convergence speed and relative computational simplicity make the NLMS algorithm ideal for the real time adaptive echo cancellation system. As the NLMS is an extension of the standard LMS algorithm, the NLMS algorithms practical implementation is very similar to that of the LMS algorithm. Each iteration of the NLMS algorithm requires these steps in the following order. 1. The output of the adaptive filter is calculated.
1 e( n ) x ( n ) x ( n) x ( n)
T
(9)
y (n) w(n) x (n i ) x (n)

i 0
N 1
(10)
2. An error signal is calculated as the difference between the desired signal and the filter output. (11) e( n ) d ( n ) y ( n ) 3. The step size value for the input vector is calculated.
( n)
4. The filter tap weights are updated in preparation for the next iteration. (13) w( n 1) w( n ) ( n )e( n ) x ( n ) Each iteration of the NLMS algorithm requires 3N+1 multiplications, this is only N more than the standard LMS algorithm. This is an acceptable increase considering the gains in stability and echo attenuation achieved IV. BLOCK DIAGRAM DESCRIPTION A general system/echo model, which was used for implementing NLMS algorithm, is illustrated in Figure 1. The terminal receives a downlink (or loudspeaker) signal, ( ), from a far-end speaker, and transmits an uplink (or microphone) signal ( ). In addition to near-end speech ( ) and noise ( ) the uplink signal potentially includes an additional echo component ( ), a result of the acoustical coupling between the loudspeaker and the microphone.
1 x ( n) x ( n)
T
(12)
V EXPERIMENTAL WORK The NLMS algorithm with ERN was simulated using Visual C++. Figure 2 shows the input signal. Figure 3 shows the desired signal. The adaptive filter is a 1025th order FIR filter. The step size was analyzed and set according to the input signal.
Figure 2 Desired Input Wave
Figure 3 Far-Ends Wave A microphone is used to input a voice signal from the user, this is then fed in to the PC sound card, which is designed to acquire (through a microphone) and produce (through a speaker) acoustic signals, comes standard in almost every PC. Here it is sampled by sound card at a fixed rate of 8 kHz. The first operation performed by the PC-based data acquisition system is a simulation of an echo response of a real acoustic environment. This is achieved by storing each value of the sampled input in a file whose index is incremented with each new sample. The echoed signal is generated by adding the sample values and the stored values in the file (as a time delay image of original sample). This signal is then fed back into the file resulting in multiple echoes. The amount of time delay the echoed signal contains is
Figure 1 Implementation of NLMS algorithm Since we already known from the past section that implementation consists of three major steps which is,(1)finding the output of the filter (Module 2), (2) Calculating the error signal (Module 3) , and (3) Updating the filter coefficients (Module 4).Before these process has been started the input speech signals s(k) must be gathered.
determined by the number of sample stored in the file as given by following equation:
td
The file length (index) of the echo cancellation system can be adjusted 500, giving a time delay in the range of 50 ms. A round trip time delay simulates the effect of sound reflecting off an object approximately 10 meters away. This echoed signal is what the adaptive filter attempts to simulate. The better it does this, the more effective the cancellation will be. The new input is sampled and input to an array which represented by the input vector x (n) . The step size value for this iteration of the NLMS algorithm is then calculated. The output of the adaptive filter is then calculated by the dot product of the input vector and the current filter tap weight vector. This output is then subtracted from the desired signal to determine the estimation error value, e (n) . This error value, the step size value and the current FIR tap weight vector are then input to the LMS algorithm to calculate the filters tap weight to be used in the next iteration. The figure 4 shows the AEC output after the implementation of the NLMS and ERN .
indexlength indexlength samplerate 8000
(14)
presence of an additive local noise but also when there is a nonlinear distortion on the acoustic echo. Second, there is no need to freeze the filter adaptation entirely during the double-talk situation when the error enhancement procedure using a compressive ERN and a regularization procedure are combined together appropriately. Third, the problem of real-world AEC is best dealt with as a system issue. It stresses the importance of the system approach in which the individually designed components are integrated together to support one another mutually for the benefit of the system as a whole. Other signal enhancement procedure as the acoustic mixing system becomes even more complex and prevalent in the future audio applications than ever before. [1] [2] [3] VII REFERENCES Ted S. Wada and Biing-Hwang Juang, Fellow, Enhancement of Residual Echo for Robust Acoustic Echo Cancellation IEEE,jan,2011. B. Widrow and M. E. Hoff, Jr., Adaptive switching circuits, IRE Wescon Conv. Rec., pp. 96 104, 1960, T. S. Wada, B.-H. Juang, and R. A. Sukkar, Measurement of the effects of nonlinearities on the network-based acoustic echo cancellation, in Proc. EURASIP EUSIPCO, Sep. 2006. T. S. Wada and B.-H. Juang, Enhancement of residual echo for improved acoustic echo cancellation, in Proc. EUSIPCO, Sep. 2007, pp. 16201624. S. Shimauchi, Y. Haneda, and A. Kataoka, Robust frequency domain acoustic echo cancellation filter employing normalized residual echo enhancement, IEICE Trans. Fundamentals, vol. E91-A, no. 6, pp. 13471356, June 2008. T. S. Wada and B.-H. Juang, Acoustic echo cancellation based on independent component analysis and integrated residual echo enhancement, in Proc. IEEE WASPAA, Oct. 2009, pp. 205208. D. P. Mandic, A generalized normalized gradient descent algorithm, IEEE Signal Process. Letters, vol. 11, no. 2, pp.115118, Feb. 2004. Hirano and A. Sugiyama, A noise-robust stochastic gradient algorithm with an adaptive step-
[4]
[5]
Figure 4 AEC Output VI CONCLUSION Thus I conclude this paper by pointing out the main contributions of my study. First, the use of error recovery nonlinearity (ERN) to enhance the error signal proves simple and effective in dealing with the robustness issue of acoustic echo cancellation (AEC) in the real world. The combined technique evidently has deep connections to the traditional noise-robust AEC schemes, and it can be readily utilized not only in the [6]
[7] [8]
size for mobile hands-free telephones, in Proc. IEEE ICASSP, vol. 2, May 1995, pp. 13921395. [9] Sugiyama, A robust NLMS algorithm with a novel noise modeling based on stationary/nonstationary noise decomposition,in Proc. IEEE ICASSP, Apr. 2009, pp. 201204J.[10] M. Valin, On adjusting the learning rate in frequency domain echo cancellation with doubletalk, IEEE Trans. Audio Speech Language Process., vol. 15, no. 3, pp. 10301034, Mar. 2007. [11] T. Gansler, S. L. Gay, M. M. Sondhi, and J. Benesty, Double-talk robust fast converging algorithms for network echo cancellation, IEEE Trans. Speech Audio Process., vol. 8, no. 6, pp. 656 663, Nov. 2000. [12] S. Gazor and W. Zhang, Speech enhancement employing Laplacian-Gaussian mixture, IEEE Trans. Speech Audio Process.,vol. 13, no. 5, pp. 896904, Sep. 2005. [13] R. Martin, Speech enhancement based on minimum mean-square error estimation and supergaussian priors, IEEE Trans.Speech Audio Process., vol. 13, no. 5, pp. 845856, Sep. 2005. [14] Y. Zou, S.-C. Chan, and T.-S. Ng, Least mean M-estimate algorithms for robust adaptive filtering in impulse noise, IEEE Trans. Circuits Syst. II: Analog and Digital Signal Proc., vol. 47, no. 12, pp. 15641569, Dec. 2000. [15] D. L. Duttweiler, A twelve-channel digital echo canceller, IEEE Trans. Commun., vol. COM-26, no. 5, pp. 647653,May 1978. [16] S. Gazor and W. Zhang, Speech probability distribution, IEEE Signal Process. Letters, vol. 10, no. 7, pp. 204207, Jul.2003.
[17] J.-M. Yang and H. Sakai, A robust ICA-based adaptive filter algorithm for system identification, IEEE Trans. CircuitsSyst. II: Express Briefs, vol. 55, no. 12, pp. 12591263, Dec. 2008. [18] J. Wung, T. S. Wada, B.-H. Juang, B. Lee, T. Kalker, and R. Schafer, System approach to residual echo suppression in robust hands-free teleconferencing, in Proc. IEEE ICASSP, May 2011, pp. 445448.
Fuzzy Keyword Search over Encrypted Data in Cloud Computing

1,2
M.Balasubramani1V.Singaravel2 Lecturer, Department of CSE/IT, Bharathidasan Engineering College,Nattrampalli
Abstract
As Cloud Computing becomes prevalent, more and more sensitive data are being centralized into the cloud. For the security of data privacy, sensitive data usually have to be encrypted before outsourcing, which makes effective data utilization a very challenging task. Although traditional searchable encryption schemes allow a user to protection search over encrypted data through keywords and selectively retrieve files of interest, these techniques support only exact keyword search. That is, there is no tolerance of minor typos and format inconsistencies which, on the other hand, are typical user searching behavior and happen very frequently. This significant drawback makes existing techniques unsuitable in Cloud Computing as it greatly affects system usability, rendering user searching experiences very frustrating and system efficacy very low. In this paper, for the first time we formalize and solve the problem of effective fuzzy keyword search over encrypted cloud data while maintaining keyword privacy. Fuzzy keyword search greatly enhances system usability by returning the matching files when users searching inputs exactly match the predefined keywords or the closest possible matching files based on keyword similarity semantics, when exact match fails. In our solution, we exploit edit distance to quantify keywords similarity and develop an advanced technique on constructing fuzzy keyword sets, which greatly reduces the storage and representation overheads. Through rigorous security analysis, we show that our proposed solution is secure and privacy-preserving, while correctly realizing the goal of fuzzy keyword search. Index TermsPrivacy, Preserving, Security INTRODUCTION As Cloud Computing becomes prevalent, more and more sensitive information are being centralized into the cloud, such as emails, personal health records, government documents, etc. By storing their data into the cloud, the data owners can be relieved from the burden of data storage and maintenance so as to enjoy the on-demand high quality data storage service. However, the fact that data owners and cloud server are not in the same trusted domain may put the outsourced data at risk, as the cloud server may no longer be fully trusted. It follows that sensitive data usually should be encrypted prior to out-sourcing for data privacy and combating unsolicited accesses. However, data encryption makes effective data utilization a very challenging task given that there could be a large amount of outsourced data files. Moreover, in Cloud Computing, data owners may share their outsourced data with a large number of users. The individual users might want to only retrieve certain specific data files they are interested in during a given session. One of the most popular ways is to selectively retrieve files through keyword-based
search instead of retrieving all the encrypted files back which is completely impractical in cloud computing scenarios. Such keyword-based search technique allows users to selectively retrieve files of interest and has been widely applied in plaintext search scenarios, such as Google search [1]. Unfortunately, data encryption restricts users abil-ity to perform keyword search and thus makes the traditional plaintext search methods unsuitable for Cloud Computing. Besides this, data encryption also demands the protection of keyword privacy since keywords usually contain important information related to the data files. Although encryption of keywords can protect keyword privacy, it further renders the traditional plaintext search techniques useless in this scenario. To securely search over encrypted data, searchable encryption techniques have been developed in recent years [2][10]. Searchable encryption schemes usually build up an index for each keyword of interest and associate the index with the files that contain the keyword. By integrating the trapdoors of keywords within the index information, effective keyword search can be realized while both file content and keyword
privacy are well-preserved. Although allowing for performing searches securely and effectively, the existing searchable encryption techniques do not suit for cloud computing scenario since they support only exact keyword search. That is, there is no tolerance of minor typos and format inconsistencies. It is quite common that users searching input might not exactly match those pre-set keywords due to the possible typos, such as Illinois and Illinois, representation inconsistencies, such as PO BOX and P.O. Box, and/or her lack of exact knowledge about the data. The naive way to support fuzzy keyword search is through simple spell check mechanisms. However, this approach does not completely solve the problem and sometimes can be ineffective due to the following reasons: on the one hand, it requires additional interaction of user to determine the correct word from the candidates generated by the spell check algorithm, which unnecessarily costs users extra computation effort; on the other hand, in case that user accidentally types some other valid keywords by mistake (for example, search for hat by carelessly typing cat), the spell check algorithm would not even work at all, as it can never differentiate between two actual valid words. Thus, the drawback of existing schemes signifies the important need for new techniques that support searching flexibility, tolerating both minor typos and format inconsistencies. In this paper, we focus on enabling effective yet privacy-preserving fuzzy keyword search in Cloud Computing. To the best of our knowledge, we formalize for the first time the problem of effective fuzzy keyword search over encrypted cloud data while maintaining keyword privacy. Fuzzy keyword search greatly enhances system usability by returning the matching files when users searching inputs exactly match the predefined keywords or the closest possible matching files based on keyword similarity semantics, when exact match fails. More specifically, we use edit distance to quantify keywords similarity and develop a novel technique, i.e., an wildcard-based technique, for the construction of fuzzy key-word sets. This technique eliminates the need for enumerating all the fuzzy keywords and the resulted size of the fuzzy keyword sets is significantly reduced. Based on the constructed fuzzy keyword sets, we propose an efficient fuzzy keyword search scheme. Through rigorous security analysis, we
show that the proposed solution is secure and privacypreserving, while correctly realizing the goal of fuzzy keyword search. The rest of paper is organized as follows: Section II sum-marizes the features of related work. Section III introduces the system model, threat model, our design goal and briefly describes some necessary background for the techniques used in this paper. Section IV shows a straightforward construction of fuzzy keyword search scheme. Section V provides the detailed description of our proposed schemes, including the efficient constructions of fuzzy keyword set and fuzzy keyword search scheme. Section VI presents the security analysis. Finally, Section VIII concludes the paper. II. Related work Plaintext fuzzy keyword search. Recently, the importance of fuzzy search has received attention in the context of plaintext searching in information retrieval community [11][13]. They addressed this problem in the traditional information-access paradigm by allowing user to search without using try-and-see approach for finding relevant information based on approximate string matching. At the first glance, it seems possible for one to directly apply these string matching algorithms to the context of searchable encryption by computing the trapdoors on a character base within an alphabet. However, this trivial construction suffers from the dictionary and statistics attacks and fails to achieve the search privacy. Searchable encryption. Traditional searchable encryption [2][8], [10] has been widely studied in the context of cryptography. Among those works, most are focused on efficiency improvements and security definition formalizations. The first construction of searchable encryption was proposed by Song et al. [3], in which each word in the document is encrypted independently under a special two-layered encryption construction. Goh [4] proposed to use Bloom filters to construct the indexes for the data files. To achieve more efficient search, Chang et al. [7] and Curtmola et al. [8] both proposed similar index approaches, where a single encrypted hash table index is built for the entire
of keywords. The fuzzy keyword search scheme returns the search results according to the following rules: 1) if the users searching input exactly matches the pre-set keyword, the server is expected to return the files containing the keyword1; 2) if there exist typos and/or format inconsistencies in the searching input, the server will return the closest possible results based on prespecified similarity semantics (to be formally defined in section III-D). An architecture of fuzzy keyword search is shown in the Fig. 1. B. Threat Model file collection. In the index table, each entry consists of the trapdoor of a keyword and an encrypted set of file identifiers whose corresponding data files contain the keyword. As a complementary approach, Boneh et al. [5] presented a public-key based searchable encryption scheme, with an analogous scenario to that of [3]. Note that all these existing schemes support only exact keyword search, and thus are not suitable for Cloud Computing. III. PROBLEM FORMULATION A. System Model In this paper, we consider a cloud data system consisting of data owner, data user and cloud server. Given a collection of n stored encrypted data files C = (F1, F2, . . . , FN) We consider a semi-trusted server. Even though data files are encrypted, the cloud server may try to derive other sensitive information from users search requests while performing keyword-based search over C. Thus, the search should be conducted in a secure manner that allows data files to be securely retrieved while revealing as little information as possible to the cloud server. In this paper, when designing fuzzy keyword search scheme, we will follow the security definition deployed in the traditional searchable encryption [8]. More specifically, it is required that nothing should be leaked from the remotely stored files and index beyond the outcome and the pattern of search queries. C. Design Goals In this paper, we address the problem of supporting efficient yet privacy-preserving fuzzy keyword search services over encrypted cloud data. Specifically, we have the following goals: i) to explore new mechanism for constructing storage-efficient fuzzy keyword sets; ii) to design efficient and effective fuzzy search scheme based on the constructed fuzzy keyword sets; iii) to validate the security of the proposed scheme. D. Preliminaries
in the cloud server a predefined set of distinct keywords W ={w1,w2.....,.wn} the cloud server provides the search service for the authorized users over the encrypted data C. We assume the authorization between the data owner and users is appropriately done. An authorized user types in a request to selectively retrieve data files of his/her interest. The cloud server is responsible for mapping the searching request to a set of data files, where each file is indexed by a file ID and linked to a set
Edit Distance There are several methods to quantitatively measure the string similarity. In this paper, we resort to the well-studied edit distance [16] for our purpose. The edit distance ed (w1, w2) between two words w1 and w2 is the number of operations required to transform one of them into the other. The three primitive operations are 1) Substitution: changing one character to another in a word; 2) Deletion: deleting one character from a word; 3) Insertion: inserting a single character into a word. Given a keyword w, we let Sw,d denote the set of words w3 satisfying ed(w, w3) d for a certain integer d. Fuzzy Keyword Search Using edit distance, the definition of fuzzy keyword search can be formulated as follows: Given a collection of n encrypted data files C = (F1, F2, . . . , FN ) stored in the cloud server, a set of distinct keywords W = {w1, w2, ..., wp } with predefined edit distance d, and a searching input (w, k) with edit distance k (k d), the execution of fuzzy keyword search returns a set of file IDs whose corresponding data files possibly contain the word w, denoted as F IDw : if w = wi W , then return F IDwi ; otherwise, if w 2 W , then return {F IDwi }, where ed(w, wi) k. Note that the above definition is based on the assumption that k d. In fact, d can be different for distinct keywords and the system will return {F IDwi } satisfying ed(w, wi) min{k, d} if exact match fails.
be realized by applying a one-way function f , which is similar as [2], [4], [8]: Given a keyword wi and a secret key sk, we can compute the trapdoor of wi as Twi = f (sk, wi). The scheme of the fuzzy keyword search goes as follows: We begin by constructing the fuzzy keyword set Swi,d for each keyword wi W (1 i p) with edit distance d. The intuitive way to construct the fuzzy keyword set of wi is to enumerate all possible words wi3 that satisfy the similarity criteria ed(wi, wi3) d, that is, all the words with edit distance d from wi are listed. For example, the following is the listing variants after a substitution operation on the first character of keyword CASTLE: {AASTLE, BASTLE, DASTLE, , YASTLE, ZASTLE}. Based on the resulted fuzzy keyword sets, the fuzzy search over encrypted data is conducted as follows: 1) To build an index for wi, the data owner computes trapdoors Twi = f (sk, wi3) for each wi3 Swi,d with a secret key sk shared between data owner and authorized users. The data owner also encrypts FID wi as Enc(sk, FIDwi 3wi). The index table {({Twi3 }wi3 Swi,d , Enc(sk, FIDwi 2wi))}wi W and encrypted data files are outsourced to the cloud server for stroage; 2) To search with w, the authorized user computes the trapdoor Tw of w and sends it to the server; 3) Upon receiving the search request Tw , the server compares it with the index table and returns all the possible encrypted file identifiers {Enc(sk, FIDwi wi)} according to the fuzzy keyword definition in section IIID. The user decrypts the returned results and retrieves relevant files of interest. This straightforward approach apparently provides fuzzy keyword search over the encrypted files while achieving search privacy using the technique of secure
IV.THE STRAIGHT FORWARD APPROACH

Before introducing our construction of fuzzy keyword sets, we first propose a straightforward approach that achieves all the functions of fuzzy keyword search, which aims at providing an overview of how fuzzy search scheme works over encrypted data. Assume =(Setup(1), Enc(sk, ), Dec(sk, )) is a symmetric encryption scheme, where sk is a secret key, Setup(1) is the setup algorithm with security parameter , Enc(sk, ) and Dec(sk, ) are the encryption and decryption algorithms, respectively. Let Twi denote a trapdoor of keyword wi. Trap-doors of the keywords can
trapdoors. However, this approach has serious efficiency disadvantages. The simple enumeration method in constructing fuzzy keyword sets would introduce large storage complexities, which greatly affect the usability. Recall that in the definition of edit distance, substitution, deletion and insertion are three kinds of operations in computation of edit distance. The numbers of all similar words w of satisfying w ) i ed(w i , i2 d = 2 for d 4 3 3
k
Without loss of generality, we will focus on the case of edit distance d = 1 to elaborate the proposed advanced technique. For larger values of d, the reasoning is similar. Note that the technique is carefully designed in such a way that while suppressing the fuzzy keyword set, it will not affect the search correctness. Wildcard-based Fuzzy Set Construction In the above straightforward approach, all the variants of the keywords have to be listed even if an operation is performed at the same position. Based on the above observation, we proposed to use a wildcard to denote edit operations at the same position. The wildcard-based fuzzy set of wi with edit distance d is denoted as Swi,d={Swi,0, Swi,1, , Swi,d}, where Swi, denotes the set of words wi with wildcards. Note each wildcard represents an edit operation on wi. For example, for the keyword CASTLE with the pre-set edit distance 1, its wildcard-based fuzzy keyword set can be constructed as SCASTLE,1 = {CASTLE, *CASTLE, *ASTLE, C*ASTLE, C*STLE, , CASTL*E, CASTL*, CASTLE*}. The total number of variants on CASTLE constructed in this way is only 13 + 1, instead of 13 26 + 1 as in the above exhaustive enumeration approach when the edit distance is set to be 1. Generally, for a given keyword wi with length , the size of Swi,1 will be only 2 + 1 + 1, as compared to (2 + 1) 26 + 1 obtained in the straightforward approach. The larger the pre-set edit distance, the more storage overhead can be reduced: with the same setting of the example in the straightforward approach, the proposed technique can help reduce the storage of the index from 30GB to approximately 40MB. In case the edit distance is set to be 2 and 3, the
size 1 of 3 S
1, 2 and 3 26 ,
3 are approximately 2k 26, 26 , 2k and4
respectively. For example, assume there keywords are 10 in the file collection with average keyword length 10, d = 2, and the output length of hash function is 160 bits, then, the resulted storage cost for the index will be 30GB. Therefore, it brings forth the demand for fuzzy keyword sets with smaller size. V. CONSTRUCTIONS FUZZY KEYWORD OF EFFECTIVE
SEARCH IN CLOUD The key idea behind our secure fuzzy keyword search is two-fold: 1) building up fuzzy keyword sets that incorporate not only the exact keywords but also the ones differing slightly due to minor typos, format inconsistencies, etc.; 2) designing an efficient and secure searching approach for file retrieval based on the resulted fuzzy keyword sets. A. Advanced Technique for Constructing Fuzzy Keyword Sets To provide more practical and effective fuzzy keyword search constructions with regard to both storage and search efficiency, we now propose an advanced technique to improve the straightforward approach for constructing the fuzzy keyword set.
w i ,2
and Sw , i 23and 2
will1 be C1+1+C1 C1+2C2+2 C . In other words, the number is
C + C d+ 2C 2C
only O( ) for the keyword with length and edit distance d.
search B. The Efficient Fuzzy Keyword Search Scheme Based on the storage-efficient fuzzy keyword sets, we show how to construct an efficient and effective fuzzy keyword search scheme. The scheme of the fuzzy keyword search goes as follows: 1) To build an distance d, the index for wi with edit request for w is the same the construction of index as for is a
a search request keyword. As a result, the trapdoor
set based on Sw,k , instead of a single trapdoor as in the straightforward approach. In this way, the searching result correctness can be ensured. VI. SECURITY ANALYSIS In this section, we analyze the correctness and security of the proposed fuzzy keyword search scheme. At first, we show the correctness of the schemes in terms of two aspects, that is, completeness and soundness. Theorem 1: The wildcard-based scheme satisfies both completeness and soundness. Specifically, upon receiving the request of w, all of the keywords {wi} will be returned if and only if ed(w, wi) k. The proof of this Theorem can be reduced to the following Lemma: Lemma 1: The intersection of the fuzzy sets Swi,d and Sw,k for wi and w is not empty if and only if ed(w, wi) k. Proof: First, we show that Swi,d Sw,k is not empty when ed(w, wi) k. To prove this, it is enough to find an element in Swi,d Sw,k . Let w = a1a2 as and wi = b1b2 bt, where all these ai and bj are single characters. After ed(w, wi) edit operations, w can be changed to wi according to the definition
data owner first constructs a fuzzy keyword set Swi,d using the wildcard based technique. Then he computes trapdoor set {Twi } for each wi3 Swi,d with a secret key sk shared between data owner and authorized users. The data owner encrypt s
FID
wi Sw i
as
,d
Enc(sk, FIDwi wi). The index table

w , Enc sk, FID
( Tw w {{
wi
i)) wi W }
and encrypted
i} i
data files are outsourced to the cloud server for storage; To 2) search trapdoor the set { (w, with the k), authorize d user computes is also derived
Tw , S w Sw,k where w,k }
from the wildcard-based fuzzy set construction. He then sends {Tw }w Sw,k to the server; 3) Upon receiving the search request {Tw }w Sw,k , the server compares them with the index table and returns all the possible encrypted file identifiers {Enc(sk, FIDwi wi)} according to the fuzzy keyword definition in section III-D. The user decrypts the returned results and retrieves relevant files of interest. In this construction, the technique of constructing
of edit distance. Let w = a1a2 am, where ai = aj or ai = if any operation is performed at this position. Since the edit operation is inverted, from wi, the same positions containing wildcard at w will be performed. Because ed(w, wi) k, w is included in both Swi,d and Sw,k , we get the result that Swi,d Sw,k is not empty.
assumption. We know that ed(w , w) i
and ed(w , w ) = 1, i i + 1. Thus, the contradictio n
based on which we know that ed(wi, w) we can get ed(w, wi) renders n. It
ed(w, wi) k because n k. Therefore, Swi,d Sw,k is empty if ed(w, wi) > k.
Next, we prove that Swi,d Sw,k is empty if ed(w, wi) > k. The proof is given by reduction. Assume there exists an w belonging to Swi,d Sw,k . We will show that ed(w, wi) k, which reaches a contradiction. First, from the assumption that w Swi,d Sw,k , we can get the number of wildcard in w, which is denoted by n , is not greater than k. Next, we prove that ed(w, wi) n. We will prove the inequality with induction method. First, we prove it holds when n = 1. There are nine cases should be considered: If w is derived from the operation of deletion from both wi and w, then, ed(wi, w) 1 because the other characters are the same except the character at the same position. If the operation is deletion from wi and substitution from w, we have ed(wi, w) 1 because they will be the same after at most one substitution from wi. The other cases can be analyzed in a similar way and are omitted. Now,assuming that it holds when n = , we need to prove it also holds when n = + 1. If w = a1a2 an Swi,d Sw,k , where ai = aj or ai = . For a wildcard at position t, cancel the underlying operations and revert it to the original characters in wi and w at this position. Assume two new elements wi and w are derived from them respectively. Then perform one operation at position t of wi to make the character of wi at this position be the same with w, which is denoted by wi. After this operation, wi will be changed to w, which has only k wildcards. Therefore, we have ed(w , w) i from the
Theorem 2: The fuzzy keyword search scheme is secure regarding the search privacy.
Proof: In the wildcard-based scheme, the computation of index and request of the same keyword is identical. Therefore, we only need to prove the index privacy by using reduction. Suppose the searchable encryption scheme fails to achieve the index privacy against the indistinguishability under the chosen keyword attack, which means there exists an algorithm A who can get the underlying information of keyword from the index. Then, we build an algorithm A that utilizes A to determine whether some function f () is a pseudorandom function such that f () is equal to f (sk, ) or a random function. A has an access to an oracle Of () that takes as input secret value x and returns f (x). Upon receiving any request of the index computation, A answers it with request to the oracle Of (). After making these trapdoor queries, the adversary outputs two challenge keywords w0 and w1 with the same length and edit distance, which can be relaxed by adding some redundant trapdoors. A picks one random b {0, 1} and sends wb to the challenger. Then, A is given a challenge value y, which is either computed from a pseudo-random function f (sk, ) or a random function. A sends y back to A, who answers with b {0, 1}. Suppose A guesses b correctly with non-negligible probability, which indicates that the value is not randomly computed. Then, A makes a decision that f
() is a pseudo-random function. As a result, based on the assumption of the indistinguishability of the pseudorandom function from some real random function, A at most guesses b correctly with approximate probability 1/2. Thus, the search privacy is obtained.
[5] D. Boneh, G. D. Crescenzo, R. Ostrovsky, and G. Persiano, Public key encryption with keyword search, in Proc. of EUROCRYP04, 2004. [6] B. Waters, D. Balfanz, G. Durfee, and D. Smetters, Building an encrypted and searchable audit log, in Proc. of 11th Annual Network and Distributed System, 2004. [7] Y.-C. Chang and M. Mitzenmacher, Privacy preserving keyword searches on remote encrypted data, in Proc. of ACNS05, 2005. [8] R. Curtmola, J. A. Garay, S. Kamara, and R. Ostrovsky, Searchable symmetric encryption: improved definitions and efficient constructions, in Proc. of ACM CCS06, 2006. [9] D. Boneh and B. Waters, Conjunctive, subset, and range queries on encrypted data, in Proc. of TCC07, 2007, pp. 535554. [10] F. Bao, R. Deng, X. Ding, and Y. Yang, Private query on encrypted data in multi-user settings, in Proc. of ISPEC08, 2008. [11] C. Li, J. Lu, and Y. Lu, Efficient merging and filtering algorithms for approximate string searches, in Proc. of ICDE08, 2008. [12] A. Behm, S. Ji, C. Li, , and J. Lu, Spaceconstrained gram-based indexing for efficient approximate string search, in Proc. of ICDE09. [13] S. Ji, G. Li, C. Li, and J. Feng, Efficient interactive fuzzy keyword search, in Proc. of WWW09, 2009. [14] J. Feigenbaum, Y. Ishai, T. Malkin, K. Nissim, M. Strauss, and R. N. Wright, Secure multiparty
VII. CONCLUSION In this paper, for the first time we formalize and solve the problem of supporting efficient yet privacypreserving fuzzy search for achieving effective utilization of remotely stored encrypted data in Cloud Computing. We design two advanced techniques (i.e., wildcard-based and gram- based techniques) to construct the storage-efficient fuzzy keyword sets by exploiting two significant observations on the similarity metric of edit distance. Based on the constructed fuzzy keyword sets, we further propose a brand new symbol-based trie-traverse searching scheme, where a multi-way tree structure is built up using symbols transformed from the resulted fuzzy keyword sets. Through rigorous security analysis, we show that our proposed solution is secure and privacy- preserving, while correctly realizing the goal of fuzzy keyword search. Extensive experimental results demonstrate the efficiency of our solution. REFERENCES [1] Google, Britney spears spelling correction, Referenced online at http: //www.google.com/jobs/britney.html, June 2009. [2] M. Bellare, A. Boldyreva, and A. ONeill, Deterministic and efficiently searchable encryption, in Proceedings of Crypto 2007, volume 4622 of LNCS. Springer-Verlag, 2007. [3] D. Song, D. Wagner, and A. Perrig, Practical techniques for searches on encrypted data, in Proc. of IEEE Symposium on Security and Privacy00, 2000. [4] E.-J. Goh, Secure indexes, Cryptology ePrint Archive, Report 2003/216, 2003, http://eprint.iacr.org/.
computation of approximations, in Proc. of ICALP01. [15] R. Ostrovsky, Software protection and simulations on oblivious rams, Ph.D dissertation, Massachusetts Institute of Technology, 1992.
[16] V. Levenshtein, Binary codes capable of correcting spurious insertions and deletions of ones, Problems of Information Transmission, vol. 1, no. 1, pp. 817, 1965.
MEMORY MANAGEMENT IN MAP DECODING ALGORITHM USING TRACE FORWARD RADIX 2 X 2 TECHNIQUE IN TURBO DECODER BY USING OF VLSI TECHNOLOGY
1
R. Mohanraj1, M.E (VLSI Design), Asst. prof.RVS college of engg and tech,Dindigul
Abstract
Iterative decoding of convolutional turbo code (CTC) has a large memory power consumption. To reduce the power consumption of the state metrics cache (SMC), low-power memory-reduced trace forward maximum a posteriori algorithm (MAP) decoding is proposed. Instead of storing all state metrics, the trace forward MAP decoding reduces the size of the SMC by accessing difference metrics. The proposed trace forward computation requires no complicated forward checker, path selection, and reversion flag cache. For double-binary (DB) MAP decoding, radix-2X2 and radix-4 trace forward structures are introduced to provide a tradeoff between power consumption and operating frequency. These two trace forward structures achieve an around 20% power reduction of the SMC, and around 7% power reduction of the DB MAP decoders. In addition, a high-throughput parallel input-mode CTC decoder applying the proposed radix-2X2 trace forward structure is implemented by using a CMOS process. Based on post layout simulation results, the proposed decoder achieves a maximum throughput rate energy efficiency . Index Terms-Low-power design, maximum a posteriori (MAP algorithm, turbo decoder). I.INTRODUCTION SINGLE-BINARY (SB) convolutional turbo code (CTC), proposed in 1993, has been proven to provide a high coding gain near the Shannon capacity limit. The SB-CTC has been adopted in the forward-error-control (FEC) scheme for wideband code-division multiple access (WCDMA) and cdma2000. In 1999, the no binary CTC was introduced, which a superior coding performance has compared with the SB CTC. In recent years, double-binary (DB) CTC has been adopted in the FEC coding of advanced wireless communication standards, such as digital video broadcasting- return channel over satellite and terrestrial (DVB-RCS and DVB-RCT), and worldwide interoperability for microwave access (WiMAX).
Fig. 1. Decoding path of the state metrics access: the (a) conventional path and (b) trace forward path Powerful soft-input soft-output (SISO) algorithms for CTC decoding are the maximum a posteriori algorithm (MAP), and its derivatives, such as the log-MAP (L-MAP), Maxlog-MAP (ML-MAP), combinations of the LMAP and ML-MAP, and enhanced Max-log-MAP (EML-MAP). In this paper, we use the term MAP for an abbreviation of L-MAP and (E)ML-MAP. The memory organization of MAP decoding is an important issue in facilitating the hardware implementation of CTC decoders. In particular, the power reduction of state metrics cache (SMC) is critical for MAP decoders. With regard to SB CTC decoding, some researches have been proposed to reduce the power consumption of the
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) SMC. The reverse computations significantly reduce SMC power consumption with forward checkers and forward flag caches. However, the forward checker and forward flag cache prolong the critical path or decoding cycles. In addition, the computational complexities of the forward computations are increased dramatically when the reverse computations are extended from the SB to the DB MAP. Although some researchers have proposed the VLSI designs of DB CTC decoders. II. BINARY CONVOLUTIONAL CODES A binary convolutional code is denoted by a three-tuple (n, k, m). n output bits are generated whenever k input bits are received. The current n outputs are linear combinations of the present k input bits and the previous m k input bits. m designates the number of previous k-bit input blocks that must be memorized in the encoder.m is called the memory order of the convolutional code. u is the information sequence and v is the corresponding code sequence (codeword). path, the difference metrics computed by the NRP are stored in the SMC. Then, the state metrics are traced back with the stored difference metrics by the trace forward recursion processor (TRP) in the reverse order. The power consumption of the SMC can be reduced by accessing the difference metrics because the number of stored metrics is lower. The computational power of TRP is small, and the overall power consumption of the trace forward path is reduced. For the trace forward DB MAP decoding, two trace forward structures are demonstrated. The radix-2X2 trace forward structure has low hardware costs, and the radix-4 trace forward structure has short path delays. Experimental results show that these two trace forward structures achieve an around power reduction of the SMC, and around power reduction of the DB MAP decoder. In addition, an application of the proposed trace forward DB MAP decoding for the DB CTC is to deploy the radix-2X2 trace forward structures to a high-throughput CTC decoder. We emphasize the 8 general transmissions of the CTC scheme without the optional hybrid automatic repeat request (HARQ) transmissions. The remainder of this paper is as follows. Section II provides reviews of the DB MAP decoding. Section III introduces some background information regarding SMC power reductions. Section IV describes the proposed memory-reduced trace forward MAP decoding. Section V demonstrates the radix-4 trace forward DB MAP decoding and the corresponding computational units. Section VI demonstrates the comparisons and experimental results of the proposed radix-4 DB trace forward structures. The prototyping chip of the 12-mode WiMAX CTC decoder is described in Section VII. Finally, Section VIII concludes this paper. II. REVIEWS OF DB MAP DECODING A CTC encoder is composed of a CTC interleaver and two parallel or serial concatenated recursive systematic convolution (RSC) encoders. Fig. 2 shows the DB RSC encoder of a WiMAX CTC scheme. The constraint length v of this DB
In this is paper, the trace forward MAP decoding is proposed to trace the state metrics back by accessing the difference metrics. Fig. 1 illustrates the decoding paths of the conventional computation and proposed trace forward computation. In the conventional path, the state metrics computed by the natural recursion processor (NRP) in the natural order are stored in the SMC. Then, the state metrics are read out to compute the a posteriori log-likelihood ratio (LLR) by the log-a posteriori module (LAPO) in the forward reverse order. In the trace forward
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) RSC encoder is 4. The transmitted code words are denoted by Powerful SISO algorithms for DB CTC decoding are the DB MAP. First, the arithmetic operations of the DBEML-MAP are described as follows: (2) Note that the values of ,
and are always equal to zero. The differences between the DB L-MAP and DB (E)ML-MAP are that a priori LLR multiplies the channel value, and MAX operations are replaced by in the DB L-MAP. The is defined as
A lookup table (LUT) can implement the corrective ln(1+e-x-y). term. The difference between the DB EML-MAP and DB ML-MAP is that the extrinsic values in the DB ML-MAP do not multiply a scaling factor. Despite the SB and DB MAP decoding, the L-MAP has the significant coding gain and the EML-MAP achieves better coding gain than the ML-MAP. Without specifying which algorithm is used, we use the term MAP for an abbreviation of the L-MAP and (E)ML-MAP in the following sections. The details of SB MAP algorithms can be referred to in [10] for the SB CTC. III. POWER REDUCTIONS OF STATE METRICS CACHE Before we introduce the proposed memory-reduced trace forward MAP decoding, the background information of SMC power reduction for the MAP decoding is demonstrated in this section. A. Conventional (Windowing) Decoding Procedure Despite the SB or DB MAP decoding,2v-1 forward(1b) recursion states metrics are computed in chronologically forward order, where v is the constraint length of a RSC encoder. Both forward recursion state metrics are required for the computation of a posteriori LLR(1d). Thus, a large SMC stores the forward recursion state metrics to
Decisions
are based on
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) compute the a posteriori LLR until the forward recursion state metrics are generated. The conventional decoding procedure employing the windowing technique was proposed to reduce the depth of SMC from a block size(N) to a window size(L), where L is 4v . (The well-known sliding window (SW) and parallel window (PW) MAP architectures can be referred respectively.) Fig. 3 shows the conventional decoding procedure. In the natural order, the NRP, composed of addcompare-select units (ACSUs), is used to recursively compute the forward recursion state metrics in cycles. The obtained state metrics in each cycle are immediately stored into the SMC and recursively fed back to the NRP to calculate the next state metrics. To compute the a posteriori LLR, the state metrics are read out from the SMC. The SMC of the conventional decoding procedure still accounts for more than 50% of the entire power consumption. To further reduce the access power of SMC, the reverse computations modified the conventional decoding procedure for the radix-2 SBMAPdecoding. The decoding procedure of the reverse computations is illustrated in Fig. 4. Compared with the conventional decoding procedure shown in Fig. 3, the reverse decoding procedure adds a reversion checker, a reversion flag cache, and a reverse recursion processor (RRP). The reversion checker decides whether the state metrics computed by the NRP are reversible or not. If a state metric is not reversible, this state metric is stored in one sub bank of the SMC. Meanwhile, the reversion flag cache stores the path information of this state metric. To compute the a posteriori LLR, the irreversible state metric is read out from the SMC according to the path information stored in the reversion flag cache. Otherwise, the reversible state metric is computed by the FFP, composed of reverse units. The recovered state metrics are recursively fed back to the FFP to calculate the next reversible state metrics. The forward computations increses the memory access speed of the radix-2x2 DB MAP decoder because the sub banks of SMC are dynamically accessed, and the computational power of the forward checker and FFP is small. However, the forward checker and forward flag cache prolong the critical path or decoding cycles. In addition, dividing the SMC into sub banks increases the silicon area of the SMC and consumes more overall power of the SMC if all sub banks are accessed. For the radix-2 SB MAP decoding, a -state trellis structure can be decomposed into radix-2 butterfly structures
Fig. 3. Conventional decoding procedure
Fig. 4. Reverse decoding procedure
B. Decoding Forward Procedure Page 221 Organized by: Department of Computer Science and Engineering , Anand Institute of Higher Technology, Chennai. www.iconic12.in E-mail: iconic.aiht@gmail.com
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) Here, the trace forward MAP decoding is proposed to reduce the power consumption of the SMC. The trace forward MAP decoding has five major stages/phases, given here. 1) The branch metrics are computed with the received code words and the a priori LLR in the natural order. 2) The forward state metrics are recursively computed by the NRP with the branch metrics in the natural order, and the difference metrics are stored into the SMC. Note that the difference metric is the difference between two state metrics and has the same bit-length of the state metric.
Fig. 5. Natural, reverse, and trace forward recursions in the (a)(c) radix-2 butterfly structures and (d)(f) radix-4 butterfly structures. The symbol index is denoted by . For the conversional decoding procedure, the computational units in (a) are the ACSUs. For the reverse decoding procedure, the computational units in (a) and (d) are the ACSUs and reversion checkers, and the computational units in (b) and (e) are the reverse units. For the trace forward decoding procedure, the computational units in (a) and (d) are ACSUs only, and the computational units in (c) and (f) are the TBUs. Fig. 5(a) and (b) illustrates an example of the SB reverse computation in a radix-2 butterfly structure. In Fig. 5(a), the reversion checker checks all fixed paths and determines the reversible paths. In Fig. 5(b), the dashed lines denote the reversible (selective) paths determined by the reversion flag cache. One radix-2 butterfly structure has four paths and four cases to be checked. The complexities are dramatically increased when the forward computations are extended from the SB to the DB MAP. IV. FAST MEMORY ACCESS FORWARD MAP DECODING TRACE-
Fig. 6. Trace forward decoding procedure 3) The forward state metrics are recursively traced forward by the TRP with the stored difference metrics in the reverse order. Concurrently, the forward state metrics are recursively computed with the branch metrics. 4) The a posteriori LLR is computed with regenerated forward (or backward) state metrics, the forward state metrics, and branch metrics by the LAPO in the forward order. 5) The extrinsic values and hard bits are computed in the reverse order with the a posteriori LLR. In contrast to the conventional MAP decoding, the trace forward MAP decoding reduces the number of stored metrics by accessing the difference metrics. Hence, the SMC power consumption is reduced. In addition, the computational power overhead of tracing the state metrics back is much smaller than the SMC power consumption. Thus, the overall power consumption of the MAP
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) decoding is reduced. The work in [18] has introduced that not absolute values but differences between the state metrics are important for the a posteriori LLR. In the proposed trace forward computation, the differences between state metrics are kept by storing the difference metrics. Hence, the trace forward MAP decoding performs without losing correction ability. In addition, the proposed trace forward computation works in the L-MAP and (E)ML-MAP. Fig. 6 shows the proposed trace forward decoding procedure, which is a modification of the conventional decoding procedure shown in Fig. 3. Compared with the conventional decoding procedure, only an additional TRP is added. Hence, the trace forward MAP decoding performs without losing correction ability. In addition, the proposed trace forward computation works in the L-MAP and (E)ML-MAP. Fig. 6 shows the proposed trace forward decoding procedure, which is a modification of the conventional decoding procedure shown in Fig. 3. Compared with the conventional decoding procedure, only an additional TRP is added. Fig. 5(a) and (c) illustrates an example of the radix-2 SB trace forward computation in a radix-2 butterfly structure. In Fig. 5(a), only one difference metric (black arc) computed by a radix2 ACSU is stored in the SMC. In Fig. 5(c), the two state metrics (white nodes) are traced back in the trace forward recursion by one trace forward unit (TBU) with the stored difference metric (black arc). The recovered two state metrics are recursively fed back to the TRP to calculate the next two state metrics. The stored metrics are reduced from two state metrics to one difference metric. The bit-lengths of the state metric and difference metric are the same. Thus, the radix-2 trace forward SB MAP decoding requires half the Fig. 8. Radix-2x2 trace forward pair: (a) the ACSU, (b) the TFU, and (c) a computational example The radix-2x2 ACSU widely used to constitute the NRP. Fig. 8(a) shows an example of the radix-2x2 ACSU. The radix-2x2 ACSU consists of four front adders, three radix-2 compare-select units (CSUs), and an LUT. In the proposed trace forward computation, three difference metrics (Diff 0, Diff 1, and Diff 2) generated by three radix-2 CSUs are stored in the SMC size of the conventional decoding procedure. Compared with the reverse computations shown in Figs. 4 and 5(b), the trace forward computation has fixed paths and requires no complicated checker and path selection. The design details of the radix-2 trace forward SB MAP decoding, including the radix-2 ACSU, radix-2 TBU, and overall architecture of the WCDMA CTC decoder, can be referred to in [19]. Based on the experimental results using a 0.18- m CMOS process, the radix-2 SB trace forward MAP decoding achieves a 21.4% power reduction of the SW L-MAP decoder with a 3.2% logic overhead. IV. Radix-2X2 Trace forward Pair
SMC. Fig. 8(b) shows the corresponding TBU of the radix-2x2 ACSU. Fig. 8(c) illustrates a computational example of the radix-2x2 ACSU and TBU. The radix-2x2 ACSU obtains the maximal state metric B based on Diff 0, Diff 1, and Diff 2. Subsequently, the radix-2x2 TBUregenerates A, B, C, and D based on Diff 0, Diff 1, and Diff 2, since B can be initially achieved. Hence, the four state metrics can be recomputed by the radix-2x2 TBU with the difference metrics stored in the SMC. In the radix2x2 TBU, the sign bit of the difference metric decides the paths of the two multiplexers and the V. ACS UNITS AND TRACE FORWARD Fig. 3 shows an example of the radix-2x2 ACS unit. The radix-2x2 ACS units consists of radixACS units and a LUT. In the trace forward technique, 3 difference metrics (Diff_0, Diff_1 and diff_2) of the radix- 2x2 ACS unit are stored in the MM. Fig. 4 shows the corresponding trace forward unit of the radix-2x 2 ACS units. With 3 difference metrics stored in the MM, the state metrics can be recomputed by the trace forward unit. Because 2 current states can trace 8 next states back, there are otally 6 difference metrics stored in the MM. The storage of the MM is reduced from 8 state metrics to 6 difference metrics. The second type is the radix-4 ACS unit which has shorter critical path but larger complexity than the radix-2x2 ACS units Compared with the trace forward unit of the radix-2x2 ACS units, the trace forward unit of the Fig. 9. Radix-4 traceforward pair: (a) the ACSU, (b) the TBU, and (c) a computational exampl VI. COMPARISONS RESULTS AND EXPERIMENTAL
operation of the binary adder/subtracted. Taking the trellis diagram shown in Fig. 7(b), for instance, six difference metrics are stored in the SMC because two current states and two TBUs can trace eight next states back. The storage of the SMC is reduced from eight state metrics to six difference metrics at each stage. Note that the values A, B, C, and D in the output end of the radix-2x2 TBU can be the input values of the LAPO to compute . This approach reduces eight adders (or in (1d)) in the LAPO. radix-4 ACS unit has less complexity. This approach reduces 8 adders in LLR unit.
Here, accurate silicon area and power evaluations are obtained by using derivatives VHDL codes synthesized
with the standard cell library. For instance, four radix2x2 ACSUs generate four state metrics at each time stage. Hence, the SMC of the conventional structures has 64 bit-lengths. However, the trace forward structure composed of eight radix-2x2 ACSUs and two radix-2x2 TFUs reduces the SMC bit-lengths with the use of Parallel-window (PW) MAP decoding in is adopted to achieve a high throughput eight parallel windows are used to decode an CTC block, and we achieves. The size
of information bits is a double of a block size because of the DB CTC. The bit-length of a state metric or a difference metric is 4. The radix-2x2 ACSU has the longest path delay, because the values of the latter radix-2 CSU in the radix2x2 ACSU are not correct until the sign (most significant) bits of the two former radix-2 CSU are stable. The radix-2x2 TFU does not suffer from this problem because the sign bits of the difference metrics are initially known. Note that eight state metrics have to be stored in the SMC, despite the conventional structure composed of the radix-2x2 ACSUs. For the radix-2 2 trace forward structure, only six difference metrics are stored in the SMC. VII. SIMULATION RESULT
[5] C. Berrou et al., The advantages of non-binary turbo codes, in Proc. IEEE Inf. Theory Workshop (ITW), 2001, pp. 6163. [6] Digital Video Broadcasting (DVB). [Online]. Available: http://www.dvb.org/ [7] Worldwide Interoperability for Microwave Access (WiMAX). [Online] Available: http://www.wimaxforum.org/home/ [8] L. R. Bahl, J. Cocke, F. Jelinek, and J. Raviv, Optimal decoding of linear codes for minimizing symbol error rate, IEEE Trans. Inf. Theory, vol. IT-20, no. 2, pp. 284287, Mar. 1974. [9] P. Robertson, E. Villebrun, and P. Hoeher, A comparison of optimal and sub-optimal MAP decoding algorithms operating in the log domain, in Proc. IEEE Int. Conf. Commun. (ICC), 1995, pp. 10091013. [10] J. P. Woodard and L. Hanzo, Comparative study of turbo decoding techniques: An overview, IEEE Trans. Veh. Technol., vol. 49, no. 6, pp. 22082233, Nov. 2000. [11] S. Papaharalabos, P. Sweeney, and B. G. Evans, SISO algorithmsbased on combined max/max* operations for turbo decoding, Electron.Lett., vol. 41, no. 3, pp. 142143, Feb. 2005. [12] J. Vogt and A. Finger, Improving the max-logMAP turbo decoder,Electron. Lett., vol. 36, no. 23, pp. 19371939, Nov. 2000.
Reference: Member, IEEE[1] C. Berrou, A. Glavieux, and P. Thitimajshima, Near Shannon limit error-correcting coding and decoding: Turbo codes, in Proc. IEEE Int. Conf. Commun. (ICC), 1993, pp. 10641070. [2] 3rd Generation Partnership Project (3GPP). [Online]. Available: http:// www.3gpp.org/ [3] 3rd Generation Partnership Project 2 (3GPP2). [Online]. Available: http://www.3gpp2.org/ [4] C. Berrou and M. Jezequel, Non-binary convolutional codes for turbo coding, Electron. Lett., vol. 35, no. 1, pp. 3940, Jan. 1999.
[13] G. Masera et al., VLSI architecture for turbo codes, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 7, no. 3, pp. 369379, Aug. 1999. [14] C. Schurgers, F. Catthoor, and M. Engels, Memory optimization of MAP turbo decoder algorithms, IEEE Trans. Very Large Scale Integr. (VLSI) Sys., vol. 9, no. 3, pp. 305312, Sep. 2001. [15] H. Liu et al., Energy efficient turbo decoder by reducing the state metric quantization, in Proc. IEEE Workshop Signal Processing Syst. (SiPS), 2007, pp. 237242. [16] Z.Wang, Z. Chi, and K. K. Parhi, Area-efficient high-speed decoding schemes for turbo decoders, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 10, no. 6, pp. 902912, Aug. 2002.
WEARABLE COMPUTING IN SMART HOSPITALS

S.Vanitha M.Tech 1, N.Jayalakshmi M.Tech 2 R.Varalakshmi M.Tech 3
1,2,3
Abstract
Sri Manakula Vinayagar Engineering College,Pondicherry.
The hospital and the health-care center of a community, as a place for peoples life-care and health-care settings must provide more and better services for patients or residents. After Establishing Electronic Medical Record (EMR) system -which is a necessity- in the hospital, providing Wearable services is a further step. Our objective in this paper is to use Wearable computing in a case study of healthcare, based on EMR database that coordinates application services over network to form a service environment for medical and health-care. Our method also categorizes the hospital spaces into 3 spaces: Public spaces, Private spaces and Isolated spaces. Although, there are many projects about using Wearable computing in healthcare, but all of them concentrate on the disease recognition, designing smart cloths, or provide services only for patient. The proposed method is implemented in a hospital. The obtained results show that it is suitable for our purpose. Keywords - Pervasive computing, RFID, Health-care.
INTRODUCTION
Wearable computing aims to provide people with a more natural way to interact with information and services by embedding computation into the environment as unobtrusively as possible [1]. The range of use of Wearable computing is very large. Emerging Wearable computing technologies are applicable in many areas of life such as healthcare, sport and education. They can be found in domestic appliances, cars, tools and even clothes. Smart homes [2]-[3] equipped with sensors and reasoning algorithms that can support their occupants daily activities are currently being developed. Such smart homes are designed to make life easier and safer. This can be a accomplished by systems that are able to detect falls or deviations from usual behavior in addition to enabling communication with appropriate authorities in case of emergency [4]. Moreover, smart homes are not the only examples of using intelligent devices to improve the quality of life. Guides showing the shortest route from one place to another [5], designing and prototype implementing of a system which can deliver pervasive learning services in a campus domain [6], and providing Wearable computing services in the hospital and in the community [7]-[8]-[ 9]-[10]-[11]-[12] are other examples of emerging Wearable computing technologies One of the major consequences of Wearable computing is the disappearing computer, i.e. computing (and
communication) power is increasingly embedded into devices and every day artifacts. When people interact with these smart objects, they might not be aware of the fact that in the background, data reflecting their current situation and behavior is collected, exchanged, and processed. This processing is going on, in many cases, for the benefit of users, but could also be carried out in the interest of the other parties. This gives room to privacy and security concerns. However, in a healthcare environment, the benefits might easily outweigh the risks. Patients are willing to give up a big portion of their privacy for the sake of medical treatment, though that must not lead to disadvantages outside this context. According to one study [13], thousands of patients die each year in hospitals due to (mostly avoidable) medical errors, imposing substantial cost on national economy. This study was performed in the U.S., but similar numbers are likely to apply in INDIA also. It is assumed that improving the procedures related to treatment can help prevent many medical errors. This clearly indicates that high potential lies in the application of wearable computing technology to process improvement. For example, medication errors are a severe problem in healthcare. In the U.S., such errors are estimated to account for 7000 deaths annually [14] and are often due to bad handwriting and similar problems [15]. Such errors could be largely eliminated through better auditing capabilities, introduced by
Organized by: Department of Computer Science and Engineering, Anand Institute of Higher Technology, Chennai. www.iconic12.in E-mail: iconic.aiht@gmail.com Page 226
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) wearable computing technology, such as Radio Frequency Identification (RFID). Reducing medical errors is one of the advantages of using wearable computing technology in hospitals. Wearable computing research covers a wide range of areas. It involves distributed and mobile computing, sensor networks, human-computer interaction, and artificial intelligence. Knowledge of all of these disciplines is essential to the creation of systems that are truly accessible to users. Researchers need to combine knowledge from medicine, Physiotherapy, psychology, and information and communications technology to In order to improve healthcare, a Wearable health system needs to address several main requirements: Have a design that is understandable for all users. A Wearable health system needs to target users; expert users who are familiar with the system and non-computer literate users who have no or little knowledge about computing. Supports policy to control access to services. Having a healthcare system that can control personnels behavior in accessing services would also be an advantage. For example nurses are not allowed to access the service of writing order. This is possible by using policy mechanisms to control mobile service operation in Wearable hospital environments. This Wearable system must support awareness and understand personnels contexts or activities. Having the ability to sense the personnels contexts and delivering the relevant information would certainly help medical staff in detecting condition and supporting services that might be needed. d) Supports proactivity [6]. It is necessary to Have a proactive system that could sense the personnels and patients current situation and automatically delivers relevant services to them. For example, when a doctor enters his/her room, at a determined time, electronic medical record and vital-sign of his/her patients appear on the screen of his/her computer. e) Availability of the central background system is essential in order to provide timely feedback to the patient and provide backup of all data sent from all spaces. This system manages all processes and services in a hospital. Manipulation of data in a health care system could have a fatal impact; therefore the integrity of data is of major importance, that means regardless of the places where data are kept or in transit, their integrity must be protected. f) Having necessary database is essential. In this scenario several necessary databases such as drug database, laboratory database, radiology database (that includes CT scan, M.R.I, sonography, mammography), physiotherapy database, patient database, people tracking database, personnel data base (that includes doctors, nurses and other personnel) and accounting database are considered. g) Different type of sensors and monitoring devices must be used in pervasive hospital system in order to gather information. For example the ability to
create more user-centered systems. When the design of a wearable healthcare system is considered understanding of its intended users is essential. An appropriate user description leads to a better understanding of the users needs and a better design of the wearable healthcare system. For that reason we propose a scenario based on EMR database, in that hospital space is divided in 3 spaces. Patients are instrumented with vital sign monitors. Physicians and nurses in this scenario have wireless PDAs. And all of them are instrumented with means to determine their locations. The remainder of this paper is organized as follows. Section 2, proposes a scenario in a hospital with its requirements and services that might serve. In section 3, we present our approach and implementation of our system. Finally, Section 4 presents our conclusions.
A. Basic Requirements
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) collect data from the patient is made possible by using biomedical sensors and other clinical equipment. h) Using RFID Technology. RFID is an Automatic Identification and Data Capture (AIDC) technology that uses Radio Frequency (RF) waves to transfer data between a reader and an object for the purpose of identifying, categorizing, and tracking the object. RFID is fast, reliable, and does not require line of sight or contact between readers and tagged objects. With such advantages, RFID is gradually being adopted and deployed in various applications, such as supply chain systems, warehouses management, security, hospitals, highway tolls, etc [9]-[17]- [18]. i) Data Aggregation. Data Aggregation could be needed for various RFID data processing tasks. For example, we may need to count the number of products passing through the door every hour, or monitor the max/min blood pressure of a patient throughout the day. j) Access Control System. Having access control policy system in the hospital is of major importance. As this system allows a doctor to access the communication service completely, but a nurse cannot be allowed to access the order form of communication service. Some Useful Necessary Services The pervasive health system enables the user to receive a relevant set of services that fit his/her current context. Some sample services which are useful for pervasive health environments are: a) Electronic Medical Record (EMR) Service: personal properties and medical history of patients must be recorded in an information record system from the time a patient enters the hospital until she/he leaves. b) Communication Service: this service is used for communication between all parts of a hospital. For example doctors can communicate with the laboratory and send their lab orders. After performing the orders, results are sent to the central background system and saved as EMR. c) Mobile Reminder Service: the system that reminds the personnel of important tasks or activities which are supposed to be completed at a particular time. For example, a doctor sets a reminder regarding the due date of the surgery to the nurse supervisory. This Fig. 2 Master-Server Paradigm A. RFID Technology in Our Pervasive Hospital reminder will be sent and downloaded onto the central system and nurse supervisory computing device. d) Mobile Navigation Service: it is necessary to guide personnel and other people to find a certain place (e.g., CCU) in a hospital. e) Automation task Service: a service that performs a task automatically. For example autodownloading the patient medical record when the doctor steps into his/her personal room. f) People Tracking Service: this service helps to follow every personnel in a hospital. It is useful especially about doctors. For example by having doctors locations; there is no need to page them in all wards when they are urgently needed in a special ward. III. PROPOSED METHOD In human-centered Pervasive Computing systems, the human is integrated into the network and is treated as part of the network composition. The system knows the human like a network node in the network. The pervasive computing devices interacting with the human are considered as servers which can proactively serve human. In this research, Master- Server paradigm is used that shown in Fig. 2, for our smart hospital system. This paradigm has service request and response. So the user can request services anytime and anywhere. This paradigm also provides services proactively based on human needs [16]. For designing and implementing this paradigm we used RFID system.
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) RFID technology makes it possible to i) collect large amount of data for tracking and identifying physical objects along their history and ii) real-time monitoring of physical objects and their environment for monitoring applications. In our work we assumed that the hospital space is divided into several spaces, each of them equipped with a reader. Each patient room has a reader, and there are several other readers deployed throughout the hospital. Context information about patient condition, personnel information, devices and drug information are provided from patient monitoring system (such as vitalsign), sensors and RFID tags. Fig.3 shows the data sources used to acquire contextual information. Different types of RFID tags are assigned to doctors, nurses, personnel and all necessary devices in the hospital. The RFID module polls the reader periodically to get the list of RFID tags visible. The passive tags use the energy incident from the reader, to return their Electronic Product Code (EPC). Each reader reads the tags at its own internal frequency. The readers are sampled every 2 s. If a tag is detected at least ten times in a 30 s window of time, then the tag is visible. However, if a tag goes undetected for 4 consecutive windows i.e. 120 s the tag is determined to have become invisible in the readers field of view. Tag visibility rules1 If number of times tag seen _ 10 then Event (Tag Visible) If tag not seen for _ 120 s then Event (Tag Invisible) However, the identification data captured by an RFID reader normally does not have complete information about the object or activity associated with the object. In this pervasive hospital, the personnel and anything that is labeled with a tag, is identified by a number, and not by its exact name or type. The process of obtaining information associated with the ID (such as name of the device/drug/personnel, date of the manufacture, the expiration date, the permitted task by personnel, etc.) is defined as Data Processing. Data Processing is typically conducted by a software program called Middleware, which formats data. As RFID tags must have a 64 bit cod, this code must have the ability to keep necessary information about the people or objects. In some cases that one tag is not sufficient, two or more tags are used. The code in this scenario consists of at least the followingfields: a. Two bits are used for identifying the tag types (device tag, drug tag, personnel tag, or patient tag) b. A 16 bit Serial Number which is a unique number of the item of the same type. c. Depending on the tag type: Personnel tag, 3 bits are used for public, private and isolated spaces. Each bit has the value of one, which means that the personnel are allowed to enter that space. 3 bits also are needed to identify the occupation of personnel. Drug tag, 14 bits are necessary for company ID, expiration date and description about consumption. Device tag, 14 bits are used for determining the ward that the device belongs to, and necessary comment for users, etc. Patient tag, consists of one bit for sexuality, and 5 bits for unit ID (CCU, ICU, Surgical Unit, etc). When one or more tag is visible by the reader, one or more application may be done, based on time and location of RFID tags and data processing that is discussed above (below).
Fig. 3 Data Source in Hospital
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) users identity tag and the items tags. These IDs are sent to a database via wireless LAN and then used in a process as shown in Fig. 5.
B. Design and Implementation To capture common features of RFID systems in the hospital, this section shows several RFID-based application scenarios and implementation some of them including: object finding, thing reminder, person location detection, auto downloading system and communication system, respectively. In all implementation, an RF sender/receiver module attached to the computer is used in place of a reader. Object Finding System: This system is supposed to be able to find an object or some objects in a room where RFID tags with ID codes and readers are placed. Some RFID tags are put under the floors in a pre-designed layout. Such a tag is named position tag that keeps a symbolic number, which is associated with the coordinates of the tag stored in a database. In one room, the RFID reader reads all RFID tags that are near it. The obtained tag IDs are sent to the database via the wireless LAN and then used in a location analysis process as shown in Fig. 4. If a tag attached to an object is read, the four surrounding position tags coordinates determine the objects position coordinate. This system can process data to find a lost object.
Fig. 5 Reminder System The obtained ID data is compared with the ID data in the database that the user input beforehand. If there is unmatched ID data that remains in the user input ID data set, the application reminds the user in some way such as a speech message of you forgot to take the desk pad. Person Location Detection System: This system can get a persons location by detecting. RFIDs carried with the person. The system requires one RFID tag per person and several RFID readers. Some RFID readers are buried under the floor in special places such as the entrance. The location data in XML format of RFID readers is stored in a database. If the person steps over a RFID reader, the reader reads the persons RFID tags and sends the IDs to the database via a wireless LAN that is connected with the reader. The tag IDs and RFID readers locations are used to locate the persons position in the application as shown in Fig.6.
Fig. 4 Object Finding System Reminder System This system is supposed to give a user advice messages in a room/house entrance when he/she forgets to take something with him/herself. The system requires that there is a RFID reader at an entrance; all things for a user to bring are attached to RFID tags, and the item IDs are input to the application in advance. When the user goes out, the reader at the entrance reads the
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) Fig. 7 People Tracking Databse Auto Downloading System. This system is supposed to be able to download persons or any kind of information that is essential in a special situation without human request. For example when Dr.Kumar enters his room, the list of his patients, their personal properties, and their vital signs appear on the screen of his computer. Fig.8 shows patients of Dr.Kumar and their properties. "SELECT AllpersonnelTable.Name,PatiantTable.UnitName,Docto rPatientTable.PatiantID,PatiantTable.PatiantName,Patia ntTable.Age,PatiantTable.VitalSign" + " FROM PatiantTable INNERJOIN (AllpersonnelTable INNER JOIN DoctorPatientTable ON AllpersonnelTable.[ID Number] = DoctorPatientTable.IDNumber) PatiantTable.[Patiant ID] DoctorPatientTable.PatiantID WHERE ON =
Fig. 6 Person Location Detection System In our implementation when one of the personnel or patients is moving through the hospital, his/her tag is detected by the readers located in all parts. Each reader sends tag ID, Date, Time and location of itself to the central background system. This information is saved in a database named Following. objDataAdapter.InsertCommand.CommandText INSERT INTO Following" umber],[Date],[time],[location])" + = "([ID
(AllpersonnelTable.[ID Number])=" + IDNumberof Dr +" ORDER BY PatiantTable.PatiantName"
+ "VALUES(@ID,@Date,@Time,@location)"; Imagine a computer in a particular location as a reader. When a person arrives adjacent to this computer, his/her unique ID is sent. This ID is received by an RF sender/resiever module attached to this computer. The name, the ID and date and time of the presence of that person is saved in Following database. This process is done for all persons and all locations in any time, and fills the following database, as is shown in Fig.7.
Fig. 8 Patients Auto Downloading of a Doctor Communication System: The purpose of these systems is to provide communication between the system user and third parties. This system includes all the devices that help to communicate with another party, person or device like systems that provide simplified languages based on pictures or specialized computer interfaces. For example, when Dr.kumar enters his /a patient room, and the history of his patients are auto downloaded on the computer, he can communicate with other hospital parts such as the laboratory via writing his orders and patient ID and pushing submit button as is shown in Fig.9. His orders
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) are saved in the related database that can be seen in Fig.10 and is sent to that part with a sound alarm. objDataAdapter.InsertCommand.CommandText = "INSERT INTO LAB" + "([Patiant ID],[Date],[time],[Orders])" + "VALUES(@PatiantID,@Date,@Time,@Lab)"; objDataAdapter.InsertCommand.Parameters.AddWithV alue("PatiantID",textBox7.Text); objDataAdapter.InsertCommand.Parameters.AddWithV alue("@Name", textBox2.Text); objDataAdapter.InsertCommand.Parameters.AddWithV alue("@Date", textBox5.Text); objDataAdapter.InsertCommand.Parameters.AddWithV alue("@Time", textBox6.Text); objDataAdapter.InsertCommand.Parameters.AddWithV alue("@Lab", textBox1.Text);
IV. CONCLUSION In this study, we presented a method of a context-aware system that analyzes data streams in a hospital. Our method uses technologies like radio frequency identification (RFID) to acquire contextual information about different items such as the usable objects resources and the staff locations in different parts of the hospital. The proposed method is implemented based on partitioning hospital into 3 different spaces; public spaces, private spaces and isolated spaces. Each of personnel RFID tag has three bits that show his/her entrance permission for three different spaces REFERENCES [1] D. C. Deborah Estrin, K. Pister, and G. Sukhatme, Connecting the physical world with pervasive networks, Pervasive Computing, IEEE, 1(1): 5969, 2002. [2] The Aware Home. Georgia Institute of Technology. [Online].2004,Available: http://www.awarehome.gatech.edu/. [3] M. Perry, A. Dowdall, L. Lines, and K. Hone, Multimodal and ubiquitous computing systems: Supporting independent-living older users, IEEE Trans. Inform. Technol. Biomed., 8(3): 258270, 2004. [4] F. Axisa, P. M. Schmitt, C. Gehin, G. Delhomme, E. McAdams, and A.Dittmar, Flexible technologies and smart clothing for citizen medicine, home healthcare, and disease prevention, IEEE Trans. Inform.Technol. Biomed., 9(3):325336, 2005. [5] H. Kautz, L. Arnstein, G. Borriello, O. Etzioni, and D. Fox, An overview of the assisted cognition project, Proceedings of workshop on Automation as Caregiver: The Role of Intelligent Technology in Elder Care, 2002. [6] Evi Syukur and Seng W. Loke, " MHS Learning Services for Pervasive Campus invironments", IEEE International Conference on Pervasive Computing and Communications Workshops, 2006. [7] Zhenmin Zhu, Xiaoli Su, "A User-Centric Pervasive Computing Services Model for Medical and Healthcare", IEEE International Conference on Grid and Cooperative Computing ,2007.
Fig. 9 A Part of Communication System User Interface
Fig. 10 Laboratory Database
[8] Sheetal Agarwal, Anupam Joshi, "A Pervasive Computing System for the Operating Room of the
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) Future", Springer Science + Bussiness Media, pp.215228, 2007. [9] Dorln Panescu, Healthcare Applications of RF Identifications, IEEE Engineering in Medicine and Biology Magazine, pp.77-83, 2006. [10] Zhenjiang Miao, Wei Su, Pervasive Computing BasedMultimodal Tele-home Healthcare System , Proceedings of the 27th Engineering in Medicine and Biology Conference, China, 2005. [11] Fabrice Axisa, Pierre Michael Schmitt, Claudine Gehin, Georges Delhomme, Eric McAdams, and Andr Dittmar, Flexible Technologies and Smart Clothing for Citizen Medicine, Home Healthcare, and Disease Prevention, IEEE Transactions on information thechnology in biomedicine, vol. 9, no. 3, 2005. [12] Mahesh Subramanian, Ali Shaikh Ali: Omer h a , Alex Hardisty and Edward C. Conley, HealthcareQHome: Research Models for PatientCentred Healthcare Services, IEEE International Symposium on Modern Computing, 2006. [13] http://www.cnn.com/ Maryann Napoli, Preventing medical errors: A call to action. HealthFacts January 2000. [14]http://www.cnn.com/ HEALTH/9911/29/medical.errors/, November 1999. [15] Zhenjiang Miao, Baozong Yuan, Discussion on Pervasive Computing Paradigm, IEEE Xplore. Restrictions apply, 2004. [16] J .Riekki, T. Salminen ,I. Alakarppa, Requesting Pervasive Services by Touching RFID Tags, IEEE Pervasive Computing, 20(5):45-52, 2006. [17] S. L. Garfinkel, A. Juels and R .Pappu, RFID Privacy: An Overview of Problems and Proposed Solutions, IEEE Security and Privacy Conference, 2005. [18] Fusheng Wang, Shaorong Liu, Peiya Liu, and Yijian Bai, Bridging physical and virtual worlds, Complex event processing for rfid data streams, In EDBT, 2006.
DETECTION OF SINKING BEHAVIOUR USING SIGNATURE BASED DETECTION TECHNIQUE, SVM & FDA IN WIRELESS ADHOC NETWORKS
1
A.Vinodh Kumar 1,P.Naveen Kumar 2,S.P.Varun Baalaji 3 Senior Lecturer,M.E,(Ph.D), Department of IT, Anand Institute of Higher Technology,Kazhipatur 2 Final Year, B.Tech I.T , Department of IT, Anand Institute of Higher Technology,Kazhipatur
Abstract
As the sinking behavior has been a great problem in wireless adhoc networks, we are proposing a hybrid detection technique (Signature-based detection and an autonomous host-based) for detecting it. The system that we are proposing maximizes the detection accuracy by using crosslayer features to define a routing behavior. The two machine learning techniques are utilized, Support Vector Machines (SVMs) and Fisher Discriminant Analysis (FDA) for learning and adapting themselves to new attack scenarios and network environments. We are associating the features of MAC layer with the features of other layer and thus reduce the feature set without avoiding or reducing any information. The effects of factors such as mobility, traffic density, and the packet drop ratios of the malicious nodes are analyzed. Experiments based on simulation show that the proposed cross-layer approach aided by a combination of SVM and FDA performs significantly better than other existing approaches. Index Terms: Cross-layer design, routing attacks, ad hoc networks, intrusion detection, sinking, Signature-Based Detection. 1 INTRODUCTION An ad hoc network is an infrastructureless network with autonomous nodes. And the network is distributed, decentralized, and dynamic. The nodes have to cooperate for services like routing and data forwarding as there is no centralized node and lack of infrastructure. Thus, the functioning of the network highly depends on the cooperativeness of nodes in the network. This unprecedented cooperative nature of the routing and data forwarding mechanism has spawned a unique security vulnerability called routing attacks. Thus, the conventional IDS architecture and algorithms have to be redesigned to suit the ad hoc network technology paradigm. There are several types of attacks and we are mainly focusing on sinking behavior attack. Sinking is a malicious behavior of nodes, where nodes do not cooperate in the routing and forwarding operations of the network. The weird behaviors that you can see in this are either to selfishly evade from the network responsibilities for resource conservation or to disrupt the network by dropping critical packets A single-layer approach is used in the most intrusion detection systems proposed in the literature for ad hoc networks, where the statistics collected from the routing protocol
communication is used to detect attack behavior. Furthermore, most of the existing IDSs use linear classifiers for training the intrusion detection model. Linear classifiers are fast, but do not yield good detection accuracies. Nonlinear learning algorithms are not used due to the resource limitations in the mobile ad hoc nodes. In this work, we propose a signature-based and an autonomous host-based IDS for detecting sinking behavior in an ad hoc network. The proposed detection system uses a cross-layer approach to maximize detection accuracy. By collecting statistics from the protocol communication at network, MAC, and physical layers, the routing behavior of node is defined. To further maximize the detection accuracy, a nonlinear machine
Fig1 Cross layer Intusion Detection System for MANET detection accuracy, a nonlinear machine learning algorithm, namely Support Vector Machines (SVMs). Hence the proposed intrusion detection system preprocess he work computational overhead of SVM. 2 NETWROK THREATS
The threat model for the proposed IDS consists of three entities and their characteristics. Network, the attack, and the attacker are the three entities.
The threat model of the proposed IDS is defined by the characteristics of these entities. Characteristics of the network include factors that help to camouflage the threating fake behavior. These characteristics of the network cause nodes to gently drop packets. This kind of dropping behavior due to the network conditions resembles the behavior of malicious sinking. Therefore, the goal of the IDS is to distinguish packet dropping induced by network conditions and from those caused by malicious sinking. The factors which can induce gentle dropping include the following: mobility of nodes, network/traffic density and traffic type Here, the dropping of the packets becomes inevitable, as re-establishing a new route takes some time. The mobility creates changing channel and fading conditions. Dropping can also be due to signal loss, interference, etc. The characteristics of the attacker or the attack also challenge the Intrusion Detection System. The characteristics include the following: duration of attack, drop ratio (i.e., the percentage of data dropped). If the attacks are happening in the irregular intervals means then the detection efficiency will be decreased. And this behavior (sporadic) in the node has great resemblance of dropping the packets gently due to the network conditions.This sinker doent do this all the packets it gets or to all the node, it exhibits this behavior at selective time period to selective nodes.
3 ARCHITECTURE
In this we are using four main modules they are data collection module, data reduction module, learning and validation modules. Data Collection where network level information is collected and a signature testing is done with the nodes to find the malicious node. Data reduction is used to reduce the processing overhead occurred by machine learning algorithms, especially SVM algorithm. Data reduction module
decreases the number of features and events in the training data. Validation is used to check the favorable of defined classification model. Since an event is mathematically represented as vector, hereinafter, event and vector will be used interchangeably.
3.1 DATA COLLECTION
This Module collects the data from routing protocol, MAC and Physical layers. At each layer this module monitors the events and the with that computes the following things: time, traffic, topology statistics and records the feature values And this out-of-bound events or outliners are removed by this module itself. A list of important cross features is listed in Table1. And SignatureBased Detection is done to check whether any malicious node is present. TABLE 1 Partial List of Cross-Layer Features
Association is a process where features from different layers are correlated to MAC layer features to form a reduced feature set. Filtering is a process of removing the redundant and uninformative cases from the training data. Sampling is a process in which a subset of the originally associated and filtered training data is chosen.
3.2.1 ASSOCIATION
A large feature set can render machine learning techniques computationally infeasible. A feature set with many features will be impossible to be used for dynamic training in mobile nodes. On the other hand, reducing the feature set with methods like feature selection/ranking will reduce the detection accuracy. Association reduces the feature set so that the overhead of learning is minimized. This is done by correlating one or more features from different layers to a specific MAC layer feature. The output of this process yields a reduced set of derived features, which preserves the information content of the correlated features. First, the features are classified based on dependency on time, traffic, and topology. Under each classification, features are correlated using a predefined correlation function on the features.
3.2.2 FILTERING
The removal of uninformative and redundant cases is called as Filtering. Uninformative events are the events which are inessential in defining or classifying a routing behavior. And redundant events are the events that are similar to other events. Removal of uninformative events is of two stages: Information from higher layer protocols is used to filter uninformative data traffic. Since the proposed detection system is located in MAC layer, data and routing control traffic has to be separated. Data traffic events are identified using higher layer protocols and most are filtered.
3.2 DATA REDUCTION
Three techniques are used for data reduction in this proposed IDS, namely association, filtering, and sampling.
Events are filtered based on the information content of events. From the perspective of SVM, events that are closer to the decision boundary separating the benign and malicious events have more information content rather than events that lie far from the decision boundary. Hence, the above property of events is used to rank the information content and filter events with negligible information content.
3.5 RETRAINING
3.2.3 SAMPLING
Sampling is the process of selecting a subset of the original training data. Even after association and filtering, the volume of training data remains large enough to impose a huge computational overhead for nonlinear machine learning techniques. The no. of events in this are reduced in huge number in sampling
The environment in ad hoc networks is highly volatile. This means that adaptation is a necessity rather than an extension. Therefore, retraining the model to changing network conditions and occurrence of new attack scenarios is an essential component. Unfortunately, for nonlinear machine learning techniques, such as SVM, neural networks, etc., to be implemented at real time require high computational resources. Conventional networks do not face this issue, as their machine learning IDSs are located in centralized, highly powerful nodes. However, in ad hoc networks, mobile nodes are constrained by the limited energy of the battery, computational capacity, etc. To overcome this issue, the proposed model requires only periodic training of the system to be performed. This naturally raises the question: when and how frequent the training should be triggered at real time? For the above purpose, FDA is used, which is less adaptive, but computationally fast learner. During real-time detection of a given behavior, both SVM and FDA will give their predicted behavior and its probabilities. The difference in the decision probabilities of SVM and FDA will aid to decide, whether to retrain the SVM model or not. A fast FDAs efficiency acts as the minimum benchmark efficiency for the SVM and indicates when SVM needs to be retrained.
3.3 LEARNING
In this module we have made use of the machine learning techniques in the[1] , where learning model is essentially a nonlinear SVMs classification model and is trained by the SVM algorithm using the reduced training data set.
3.3.1 MACHINE LEARNERS
It is a process in which a set of threshold parameters is trained to classify an unknown behavior. It has a linear function [1] and segregates the nodes as benign r malicious based on its value
3.4 VALIDATION
4 RESULT AND ANALYSIS
The main purpose of this module is to check the trained data from the nonlinear machine algorithms SVM. The results from the machine learners are compared based on the direction of SVMs margin and FDAs hyperplane. And in the second phase of validation the underfitting in the model are compared and validated. This process continued until the training accuracy of SVM surpasses that of FDA. However, if the difference of the SVM training accuracy between the trials is low, the retraining is stopped, and in this case, FDA is used for detection.
To validate the model that we have proposed we have to set a simulation method with some conditions specified in the TABLE2. And that is ideal simulation conditions .For performing the experiment and stimulation according to TABLE3 setup for scenarios and network condition. In our experiment we see only three factors mobility, packet drop ratio and traffic density. The changing traffic types will actually cause the traffic density to change and that influence the nodes highly in its performance.
TABLE 2 Attack Simulation Setup
most of the cases, the resampling was unnecessary as the effectiveness of filtering and association. The high entropy events for training are selected by the filtering and association processes. This avoids the need of resampling but, it should be noted here that filtering and association will effective only for cross-layer methodologies, as in these methods more information is available for the semantic filtering process. On the other hand, in single-layer methods, only redundant information can be filtered and association is not applicable.
TABLE 3 Sinking Attack Scenarios
The results are compared between cross-layer SVM (aided by FDA), cross-layer FDA, and single-layer SVM (aided by FDA). Though single-layer FDA was studied, the results were poor to be used for comparison. Cross-layer methods use associative features, which combine MAC layer statistics with other layer statistics. Single layer uses only MAC layer statistics. In
This is one of the primary reasons why single methods perform worse than cross-layer methods excel. The dependency of the IDS on the protocols lies in the feature set used for training the detection model. Currently, in the paper, the feature set is defined for Optimized Link State Routing (OLSR) routing protocol and MAC802.11b protocol. On the contrary, cross-layer methods experience a negligible drop in detection efficiency as traffic density increases. This is primarily due to the background traffic, which hides the malicious
behavior. This also shows that the number of packets a node is sinking does not affect the detection efficiency. Similar to traffic density, selectivity of packet dropping affects the efficiency of singlelayer methods more than cross-layer methods. Cross-layer methods experience a small drop in the efficiency, as the sinker drops less percentage of packets. The cross-layer SVM scheme efficiency is typically higher compared to crosslayer FDA, as in some scenarios they perform almost the same. However, SVMs ability to thwart intelligent behavior of a sinker is crucial and vantage point for the IDS.
filtering, respectively. A linear machine learning method, FDA, is used to check whether the chosen training data are always optimal.
REFERENCES
5 CONCLUSION
In our work we have given a fast adaptive and brave system that detects the sinking behavior more efficiently than many of the existing system. The cross-layer approach forms a bigger feature set, the high complexity incurred has made combining cross-layer schemes and nonlinear machine learning techniques infeasible. The number of features and the training data size are reduced by the process of association and
[1] P. Brutch and C. Ko, Challenges in Intrusion Detection for Wireless Ad-Hoc Networks, Proc. 2003 Symp. Applications and the Internet Workshops, 2003. [2] C.J.C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, vol. 2, no. 2, pp. 121167, 1998. [3] A. Mishra, K. Nadkarni, and A. Patcha, Intrusion Detection in Wireless Ad Hoc Networks, IEEE Wireless Comm., vol. 11, no. 1, pp. 48-60, Feb. 2004. [4] Y.A. Huang and W. Lee, Attack Analysis and Detection for Ad Hoc Routing Protocols, Proc. Symp. Recent Advances in Intrusion Detection, pp. 125-145, 2004. [5] M. Little, TEALab: A Testbed for Ad Hoc Networking Security Research, Proc. IEEE Military Comm. Conf. 2005 (MILCOM 05), 2005. [6] P. Papadimitratos and Z. Haas, Secure Routing for Mobile Ad Hoc Networks, Proc. SCS Comm. Networks and Distributed Systems Modeling and Simulation Conf. (CNDS 02), 2002. [7] G. Thamilarasu et al., A Cross-Layer Based Intrusion Detection Approach for Wireless Ad Hoc Networks, Proc. IEEE Intl Conf. Mobile Adhoc and Sensor Systems 2005, 2005. [8] F. Anjum and P. Mouchtaris, Security for Wireless Ad Hoc Networks. Wiley, 2007. [9] Y. Liu, Y. Li, and H. Man, Short Paper: A Distributed Cross-Layer Intrusion Detection System for Ad Hoc Networks, Proc. First Intl Conf. Security and Privacy for Emerging Areas in Comm. Networks 2005 (SecureComm 05), 2005. [10] H. Deng, Q.-A. Zeng, and D.P. Agrawal, SVM-Based Intrusion Detection System for Wireless Ad Hoc Networks, Proc. IEEE 58th Vehicular Technology Conf. 2003 (VTC 03Fall), vol. 3, pp. 21472151, 2003.
MOBILE AD SENSE
R.BHAVANI 1,T.JERIL VIJI2
1
Final Year, M.E.,Computer and Communication Dept., Saveetha Engineering College, Chennai, India
2
Assistant Professor, ECE Dept., Saveetha Engineering College,Chennai, India
Abstract
Mobile Ad sense is to design ad sense for Mobile application which is to display text ads in mobile based on user location. The idea of this paper is to increase mobile users by providing free ads through the deployment of mobile advertising framework by the telecommunication system We describe a framework for delivering appropriate ads is the three key issues on how to show the ads, when to show the ads, and what potential ads to be clicked by the subscribers. The advertisers make mobile users with the right ad at the right time as they seek information on the go. Telecom Manager can run the ads agent platform to attract investments from advertisers. Subscribers read promotional advertisements that are sent to subscribers mobile phones to get right ads.ndexTerms: Mobile Ads, advertising framework, advertiser, telecommuniction manager, subscriber.
I INTRODUCTION
In recent years, we have seen the trends in the growing popularity of mobile device (e.g. mobile phone, PDAs, vehicle phone, and e-books) and mobile commerce application (e.g. mobile financial application, mobile inventory management, and mobile entertainment, etc.). Mobile ad sense is sort of advertising program in a mobile which displays text based ads for the mobile users at the right time. According to the report of International Telecommunication Union (ITU), mobile cellular subscriptions have reached an estimated 5.3 billion (over 70 percent of the world population) at the end of 2010. In addition, the number of mobile devices in some countries is larger than their population, e.g. Taiwan, England, Netherlands, and Italy. Intuitionally, mobile device is now playing an increasingly important role in human life. Moreover, it is becoming a special market potential. According to IAB (Interactive Advertising Bureau), mobile advertising is accepted by a lot of consumers, comparing to online advertising on World Wide Web due to its highly interactive with users, which is hard to achieve for other media.
Mobile advertising is an alternative way of web monetization strategies, especially for telecommunication corporations to expand revenue. Furthermore, it is predicted that the average annual growth will rose by 72% in the next five year. There are many ways of mobile communication such as WAP, GPRS, 3G modem, and PHS. However, due to high charge of internet access, not all users have mobile internet access. Mobile broadband subscriptions are 1 billion (about 20 percent of the mobile subscriptions). In Taiwan, for example, the ratio of the mobile web users to the mobile users is less than 50%. Meanwhile, VAS (Value-Added Services) is deeply influenced by prices. There are about 75% customers, who will consider prices rather than interest when choose VAS. Therefore, the lower VAS prices and internet accessing charges the customers get the more customers we find. Thus, a way to reduce the internet accessing fee mobile devices is important [2]. A wide range of tracking systems has been developed so far tracking vehicles and displaying their position on a map, but none of the applications has been developed so far which tracks the mobility of a
human being. Now a day's tracking a person's mobility has became a crucial issue these days be it tracking a criminal came on payroll or a detective going to detect a case or any other utility. In this paper, the system which is cost effective and can be used for tracking a human being using a GPS and GPRS equipped mobile phone rather than using a handheld GPS receiver. This system is to reduce the overall cost of tracking based on GPS system which is a satellite based service which is available 24X7 everywhere in the whole world.GPS system can be used to get location which includes details like latitude, longitude and altitude values along with the timestamp details etc. it a free of cost service available to every individual. In order to track the movement of the person we have used Google Maps for mapping the location Data loggers it contains large memory to store the coordinates, data pushers additionally contains the GSM/GPRS modem to transmit this information to a central computer either via SMS or via GPRS in form of IP packets.
sent by the mobile phone. The mobile phone which fetches the GPS location communicates with the server using General Packet Radio Service (GPRS). This service is a low cost service provided by the service providers, which is a wireless data communication system [5]. Mobile phones equipped with GPS receiver are easily available in the market these days and is a booming technology these days. This cell phone technology has enabled us to communicate almost every part of the world across the boundaries. GSM/GPRS is one of the best and cheapest modes of communication present these days and in future. A GPS tracker essentially contains GPS module to receive the GPS signal and calculate the coordinates. For
II ARCHITECTURAL MODEL
The architecture indicates the Ad Agent Platform with three components as the mobile ad database, the user profile and the personalized ad matching. The ad allocator matches mobile ads with user profiles based on subscribers location information and current velocity. Personalized ad matching can be regarded as an information retrieval problem. We then calculate similarity between the user information and the ad information with combination of various IR models.
Fig.1 Architectural Model
III ADVERTISING KEY FUNTIONS

In this methodology, this addresses the 3 key issues for Mobile advertising considering the transmission media, the transmission time and selection mechanisms for mobile advertisement. While delivering ads to subscribers, the potential ad allocator also records the time that user spent on ads, the number of mobile ads that are clicked/ closed by the
user in exchange for discounted mobile broadband access.
Fig. 2 Key functions A. User /Subscriber Profile Manager In this User module, the mobile users are advertised based on the collection of personal data from user, taking users characteristics, Religion and other preference status. Consideration of Demographic status also serves this purpose
B. Advertiser Manager It tracks humans mobility using two technologies via GPRS (General Packet Radio Service) and GPS. The main aim is to reduce the cost using built-in GPS receiver. It works in two phases as Tracking In the this phase, the mobile phone GPS receiver fetches the GPS location, after calculating the exact location it further creates a GPRS packet which includes the along with the location details a unique identifier called International Mobile Equipment Identity (IMEI) number and timestamp details. The same application later sends this GPRS packet to the server which stores the data in a Mobile Object Database (MOD) developed in MySQL. Mapping In this phase the data is fetched from the database and is displayed on the Google map on the web application developed in JavaScript based Ajax technology for live tracking. C. Telecommunication manager In this the Ad Agent Platform consists of three components as, the mobile ad database, the user profile and the personalized ad matching. When the advertiser registers commercial mobile ads, we obtain the data for the mobile ad database. Similarly, we can attain subscribers information as user profile when they subscribe for the mobile advertising service and download the mobile ad allocator which is developed by carrier to get discounted broadband access. The ad allocator matches mobile ads with user profiles based on subscribers location information and current velocity.
Fig. 3 Category of people
Fig .4Demographic Status
IV RESULTS AND DISCUSSION

The Mobile user of exact location can be tracked based on the users current position and velocity using a GPS built service in a mobile application. The Map view shows the exact point,a place Santha at which he moves over a town.
Fig .5 Mobile users tracked using GPS and advertised based on location.
CONCLUSION
In this paper, we have proposed mobile ad sense, which is to design an Ad Sense for mobile application. The text based ads are displayed in mobile based on user location by using tracking and mapping phase. The advertisers make mobile users with the right ad at the right time. We address the three key issues as (1) How to show the mobile ads, (2) When to show the mobile ads and (3) What potential mobile ads will be click by the subscribers the ad allocation based on all factors (locality, preferences, and users demographic status) can achieve the best performance. In the future, personalized advertising base on the individual history can be explored to increase mobile advertising effectiveness. In addition, click fraud is also an important issue that needs to be prevented just like online advertising. Thus, the storing and updating of users information to telecom carriers has to be secured.
Fig .5 Mobile users tracked using GPS in Map view
Fig .5 Mobile users tracked using GPS in Map view in reference to speed& velocity
REFERENCES
[1] Online Advertising: Defining Relevant Markets, James D. Ratliff* and Daniel L. Rubinfeld. [2] Online Display Advertising:Targeting and Obtrusiveness,Avi Goldfarb and Catherine,Feb 2010. [3] Automaticcontent targeting on mobile phones, Giuffrida G., Sismeiro C.., Tribulato G.
[4] Implications for Web-Based Advertising Efforts Marketing Science, Chatterjee, P., Hoffman, D. L., and Novak, T. P., [5] Broadband Mobile Advertisement: What are the Right Ingredient and Attributes for Mobile Subscribers, Chen, P.-T., Hsieh, H.P., Cheng, J. Z., Lin, Y.-S. (2009). [6] Comparison of Allocation Rules for Paid Placement Advertising in Search Engine, Feng, J., Bhargava, H.K., & Pennock, D. (2003).
An Intelligent Instant Messenger Using Smart Device based on Jabber Server A High Tech Innovative Communication Tool
Nalini Priya.G1 Maheswari K.G2 Associate Professor, Department of Information Technology, KCG College of Technology, 2 Asst prof.Department of MCA, IRTT,Erode
1
Abstract
This paper analyses the previously existing instant messengers and implements a secure mobile instant messenger for mobile environments targeted at computers. The interface scalability, changing and querying the state, security while data transmitted from mobile to any kind of computers and vice versa are some major issues for a mobile instant messaging system designed for mobile environments. This system overcomes the above issues all the way through the xml security enhancements at both the server and the client side. The location awareness, end to end encryption and access control enhancement mechanisms all work together to provide greater security throughout the data transmission from client1server-client2. Finally, a fine grained mobility and security is obtained, when jabber and XMPP work together.
I. INTRODUCTION A small thought has nurtured into a well planned idea of having a real-time interaction between individuals, using a messenger over the network. This has given birth to the concept of instant messengers. In reality, many messengers are present that include the mechanism of having to connect to other person to initiate a conversation. Likes of Google talk, yahoo messenger, AOL etc are some of the software available that support this idea. They provide the way for millions of people around the world to stay in contact with one another, share files, engage in video conference or plain old voice chat. In addition to the mentioned, they carve out other attractive features like maintaining a friend list, administrative control over the settings of the software, invitation deployment, to name a few. But this paper introduces a different approach to the traditional path of messengers. Here a messenger is introduced which is based on open source resource. First, it is completely open in nature and is available freely for everyone, hence it provides scalability and accessibility. And being open ended, thousands of people around the globe can lend a helping hand to improve the content and rid of all the bugs that will be continuously monitored and eradicated. This software will follow all the standards and requirements of chat protocols and internet domain. With the help of this, the possibilities of owning a messenger for various fields of life seems not too distant. A. Extensible Messaging and Presence Protocol (XMPP) XMPP is an open-standard communications protocol for message-oriented
middleware based on XML (Extensible Markup Language). Unlike most instant messaging protocols, XMPP uses an open systems approach of development and application, by which anyone may implement an XMPP service and interoperate with other organizations' implementations. XMPP-based software is deployed widely across the Internet, and by 2003, was used by over ten million people worldwide, according to the XMPP Standards Foundation. The XMPP network uses a client server architecture (clients do not talk directly to one another). However, it is decentralized by design, there is no central authoritative server, as there is with services such as AOL Instant Messenger or Windows Live Messenger. Every user on the network has a unique Jabber ID (usually abbreviated as JID). To avoid requiring a central server to maintain a list of IDs, the JID is structured like an e-mail address with a username and a domain name for the server where that user resides, separated by an (@), such as username@example.com.Since a user may wish to log in from multiple locations, they may specify a resource. A resource identifies a particular client belonging to the user (for example home, work, or mobile). JIDs without a username part are also valid, and may be used for system messages and control of special features on the server. A resource remains optional for these JIDs as well. Jargon Instant Messenger is a secure messaging client and server environment targeted for various computers in an organization. This all new instant messenger has features including connecting to other popular networks like Google talk, server side friend lists and individual profiles, asynchronous level of messaging and covering all major levels of security. Maintaining a friend list is convenient for any end user as it gives complete freedom to choose any pal for instant chat. Maintains an
end user identity and authenticate with their unique identity. It also allows attaching to other messaging protocols that makes it completely universal. Security throughout the environment mainly focuses on key management level of security at server side and end to end encryption at client side. The intended audiences are people looking for a chat or in an organization interested in a full featured chat. And the best part is any person with a valid Jargon account can log in from any part of the world. It provides security at all the levels. II. EXISTING SYSTEM The concept of instant messengers has been present for a long time. Many different existing messaging systems are readily available, namely Gtalk[14], Yahoo Messenger[15], MSN Messenger[16], etc. The concept is more or less, similar in all these software. They include a traditional way of staying in touch, but with the accompanying issues & drawbacks. Some of them are, frequent overload of servers, unexpected spam contents, loss of data on database, and the newest, attack from malicious bots and malware. The existing systems are widely used around the world. They boast of features like instant offline messaging system, telephony communication, maintain a friend list, etc. But even though they are popular and backed by famous names like Yahoo Corporation, Google Inc, Microsoft, they can still be improved in various areas like security, user friendliness, adaptability, and upgradable. Existing systems have, in due course, begun to show inconsistencies in the performance aspect of the system. After each iteration, new bugs have started to creep in, like missing references, BSDs (Blue Screen of Death), performance issues, incompatibilities, illegal memory corruption, infrequent updates etc. So our jargon instant messenger is totally opensourced and allows creativity in order to improve unconditionally.
III. SCOPE OF THE PAPER Users of the IM technologies undergo number of security risks. For example sending attachments through IM fosters a richer experience and the files are subject to viruses and other malware. IM attachments often bypass antivirus software. Some IM networks are susceptible to eavesdropping. The problem looming on the horizon is spim i.e. the IM version of spam. Its hardly difficult to verify instant messaging source which is sending unwanted messages to IM clients and the bogus advertisements, solicitations for personal information. Sometimes, the data is lost midway, and hence as a result, the data integrity is corrupted and the smooth flow of messages is disrupted. Compatibility issues arise when different platform is used and specific version of protocol is used. Server
Client 1
Client 2
Fig.1 Peer to Peer Connection The main function of instant messenger is to connect both the clients (who would like to chat to the server) and when once they both are getting connected, the path for them is established. From next messages they establish peer to peer connection. Security algorithms are included at all levels for preventing security and data breaches.
In the above diagram, the clients connect to each other with the help of authentication from server. Once the server provides the necessary permission and protocol connection, the client 1 & client 2 can exchange data simultaneously. IV. PROPOSED SYSTEM Our task is to design an active software that attempts to provide chat system for all involved Here the system that is proposed is called Jargon Instant Messenger. In this system, a new concept is introduced of a instant messenger which is built upon the open source Jabber Server and the Netbeans IDE (Integrated Development Environment). The advantage of this system is that it overcomes all the issues & drawbacks and improves upon their established areas. As this is partially developed in Java, it has strong security and well set-up customer base and developerstesters ratio. Here the user can set up their own buddy list to interact with, and send messages to each other. The most sought-after feature here is the ability to merge the entire friend list from different protocols like Google, Yahoo etc. The design is implemented using the paper Mobile Jabber IM: A Wireless-Based Text Chatting System [2] that defines instant messaging is an internet based protocol application that allows one to one communication between users employing a variety of devices. Jabber is the most widespread open source platform, using an XML encoded protocol, especially tailored to provide instant messaging and presence services over the Internet. As per the proposal presented on paper, the required initiative is taken and the system is constructed. The requirements of the product design are studied and the client & server are constructed accordingly. A. Server Design of Messenger. The server is designed using the Jabber server thats been implemented using the paper - A
Study of Internet Instant Messaging and Chat Protocols [4] which articulate that the framework enables mobile devices to abstractly exchange streams of data using an underlying communication infrastructure, and also supports the continued connectivity of mobile devices as they roam into different physical locations. To achieve this, the famous Jabber protocol was initially ported to mobile devices, and supporting functionality that monitors the roaming behavior of mobile devices was built as part of a framework and supporting library. B. Client Design of Messenger The paper A Study on Jabber-based Instant Messaging System for Mobile Networks(1) describe the usual usage scenarios for wearable computers are support of users doing a certain task e.g. Aircraft design and military applications. The pros and cons of user interface restriction when the user is moving while using some wearable computers is noted. C. Design Requirements of Messenger The five designs represented below are developed in this project where security is available at all levels of implementation.
As per the use-case diagram shown above, it is shown that the user can log in to the system and the user is authenticated for further processing. The next step is the message read module. Here after logging in, the user can read any message that is received and information is gained. The next module involves typing a new message where the user can input any piece of information which is required to be sent to the other side. New buddies can be added as per the liking of the account user and existing contact can also be removed as per the will. These are the 5 steps that are taken in the above use-case diagram to explain the overall functioning of the system. V. EXPERIMENTAL SETUP The architecture specified below is for the all new instant messenger Jargon Instant Messenger that focuses on all level of securities. The sip protocol and timber xml database working with jabber server makes the instant messenger to implement the securities at all levels. The architecture involves a no of levels to be created for smooth functioning of the system. The different components are Jabber client PDA, Jabber client PC, Jabber client mobile phone with a plug-in installed.
Fig.2 Design Requirements of Instant Messenger Organized by: Department of Computer Science and Engineering, Anand Institute of Higher Technology, Chennai. www.iconic12.in E-mail: iconic.aiht@gmail.com Page 248
C. Working Steps of Jargon Messenger The working of Jargon instant messenger is addressed below: Input User enters user ID and password 1. The pal1 makes a request to the server using the browser start-up. This start up is run by the help of plug-in program. 2. The pal1 is acknowledged with a screen of user login and password. The login credentials are validated by the jabber server where the security mechanism of access control is established. Fig.3 Architecture of Jargon Instant Messenger B.Architecture in Detail:In the above architecture representation, we follow the bottom-top approach. Here, we have 3 clients in different platforms, PDA, Computer and Mobile. Here each client can communicate with the Jabber server through the way of jabber plug-in. The jabber plug-in is installed in the browser of the client. It acts as an interface between a client machine and jabber server. The server consists of XMPP and WAP. WAP helps to [5]perform a connection with the device in order to initiate a communication path between the device and the network. It also consists of an open-source database named Timber. It is a database which is based on XML data. Henceforth, the server connects with the mobile phone network with the help of gateway. Gateway plays a very integral role here. 3. The login credentials of the pals are stored in the form of xml streams in xml database Timber. 4. The pal1 after passing the security check of he/she is an authorized client the user types the message in the editing space provided in the browser. 5. The option to send the message to the pal whos in his/ her(pal1s) contact is made. Here the security mechanism of end to end encryption is applied. 6. User can personalize the contacts and their contacts are maintained in the Timber xml database. This database fetches the data faster and store secured information. 7. The user can quit from the chat stream whenever he/she wishes, such that the plug-in instantly disconnects the client from the server.
Output The user is able to chat with other users Fig.4 Steps of Jargon Messenger VII. SCREENSHOTS OF THE PROJECT The different screenshots of the project are presented as follows. The interface is very simple and attractive. Everything from realtime exchange of messages to well presented displays of online users are carried out very intelligently by the engine of the software.
This helps to login into the Jargon Messenger. After login, a chat window appears showing the details of all users present online for chat. Here the user can type any message and send it to other users to & fro to start the chat process. VIII. COMPARATATIVE ANALYSIS Jabber XMPP protocol which was standardized by IETF, is an open standard can be accessed using the unique jabber ID. There are other open standard instant messaging protocols like Gale, IRC and PSYC which are compared below to differentiate the excellence of Jabber protocol. The GALE protocol can only support transport layer security using public/private key and form groups. The IRC can support asynchronous message relying but was done via memo system that differs from main system. The IRC had transport layer security depending on individual support, with simplistic routing of one-to-many multicast. This had a medium level of spam protection and support for groups. PSYC had paved way for all kinds of technology support including asynchronous message relaying, transport layer security, unlimited number of contacts, and bulletin to all contacts, customized multicast, and higher level of spam protection. But the major drawback of PSYC was that allowed channels for non members to access the instant messenger, also had no support for audio and webcam/video. Table.1 Comparison of chat protocols Protocol s Transpo Groups rt layer and non security member s Yes Yes Yes Yes Yes Spam Audio/ protectio Video n No No No
Fig.5 Messenger Login
Fig.6 Chat Main Window In the above figures, a login form is presented to the user. Here the user enters a buddy name as well as the server address and server port.
Gale IRC PSYC
No Medium level Yes, but Higher
Jabber
Yes
allow non members Yes, doesnt allow nonmembers
level Higher level Yes
The above diagram shows the comparative analysis of all chat protocols with the jabber server in a tabular form which is used in this project. Here Jabber has more features in comparison.
of XMPP protocol which shares its roots with XML. A powerful and audience friendly chat system is prepared which consists of features like buddy list, accurate spam protection and support for latest standards. This messenger mainly targets at the attempt to connect the jabber server with Timber xml database. The messenger has implemented the unique open sourced jabber protocol. This decision has been made by the comparative study made on the already available existing instant messaging protocols. The whole messenger is of xml based such that xml security using jabber is the main focus of this messenger. The extensions for file transfer, video and audio communications are under study. This system will be helpful for military purposes, profit/ nonprofit organizations. X. REFERENCES [1] Yunxiang Gao, Chuangbai Xiao, Chaoqin Gao, Huiguang Chen A Study on Jabber-based Instant Messaging System for Mobile Networks Journals 2010 [2] Jerry Gao, Ph.D., Mansi Modak, Satyavathi Dornadula, and Simon Shim, Ph.D. Mobile Jabber IM: A Wireless-Based Text Chatting System Journals 2010 [3] K. M. Goeschka, Helmut Reis, Robert Smeikal XML Based Robust ClientServer Communication for a Distributed Telecommunication Management System Journal 2010 [4] Raymond B. Jennings III, Erich M. Nahum, David P. Olshefski A Study of Internet Instant Messaging and Chat Protocols Journal 2011 [5] Steven Gianvecchio, Mengjun Xie, Member, IEEE, Zhenyu Wu, and Haining Wang Humans and Bots in Internet Chat: Measurement,Analysis, and
Fig.7 Comparative analysis of chat protocols It is evident from the above pictorial representation that it allows many features like group and non group members to join the chat, provides spam protection for providing security and also last but not the least, a facility to engage in audio/video chat with anybody present in the friend list. IX. CONCLUSION In this work, we have presented a mobile computing based project that helps to keep people in contact in a safe and technically-sound way. The Jargon Instant Messenger has been incorporated with the jabber server which is open based in nature and provides enhanced security levels with the help
[6]
[7]
[8]
[9]
Automated Classification Transaction - 2012 Raymond B. Jennings III, Erich M. Nahum, David P. Olshefski, Debanjan Saha A Study of Internet Instant Messaging and Chat Protocols Journal 2006. Yi Nan Liu, Ze Gui Ying, Jian - Ping Li A Novel method for Remote Monitoring Of Instant Chat in Lan Journal 2009. Minhong Yun, Seokjin Yoon, Sunja Kim Mobile Component Run-time Environment for Mobile Devices ICACT Journal - 2010 Liu Rui-xiang, Qian Xu and Zhang Yu-Hong Research the Integration
of Web Mobile Service and Agent Journal 2010 [10] Magnus Skjegstad, Ketil Lund, Espen Skjervold, Frank T. Johnsen Distributed Chat in Dynamic Networks Journal 2011. [11] Fatna Belqasmi and Roch Glitho Restful Web Services for Service Provisioning in Next-Generation Networks Journal - 2011 [12] John R. Smith Jingle: Jabber Does Multimedia Journal 2007. [13] Peter Saint-Andre Streaming XML with Jabber/XMPP Journal 2005. [14] Google www.google.com/talk [15] Yahoo Messenger in.messenger.yahoo.com [16] MSN
Similarity Model with User- And Query-Dependent Ranking Approach for Data Retrieval
R. Raju1 Thangalatha legaz2 M. Pakkirisamy3 S. Md . Haja Sherif 4 P. Venkadesan5 1,2,3,4,5 Sri Manakula Vinayagar Engineering College ,Puducherry
Abstract
With the emergence of the deep Web, searching Web databases in domains such as vehicles, real estate, mobiles etc. has become a routine task. One of the problems in this context is ranking the result of a user query. Earlier approaches for addressing this problem have used frequencies of database values, query logs, and user profiles. But this approach does not consider the Functional Dependencies in database, query with user, correlation attributes for queries, and user profile. In this paper we provide a ranking model based on user similarity with user profile and query similarity with functional dependencies and correlation attribute. The usage of functional dependency and correlation attribute increase the performance measure such as response time to a considerable extent. This approach is effective than the existing approach, because the use of functional dependencies reduces the replication of the data and hence prevent the database to be normalized separately, also reduces the storage space of the database. Keywords Ranking, Web Databases, User Similarity, Query Similarity, Workload, and Performance. The emergence of the deep Web [1] has led to the proliferation of a large number of Web databases for a variety of applications (e.g., airline reservations, vehicle search, real estate scouting). These databases are typically searched by formulating query conditions on their schema attributes. When the number of results returned is large, it is time-consuming to browse and choose the most useful answer(s) for further investigation. Currently, Web databases simplify this task by displaying query results sorted on the values of a single attribute (e.g., Price, Mileage, etc.). However, most Web users would prefer an ordering derived using multiple attribute values, which would be closer to their expectation. We decompose the notion of similarity into: 1) query similarity, and 2) user similarity. While the former is estimated using either of the proposed metrics querycondition or query-result, the latter is calculated by comparing individual ranking
I. INTRODUCTION
functions over a set of common queries between users. Although each model can be applied independently, we also propose a unified model to determine an improved ranking order. The ranking function used in our framework is a linear weighted-sum function comprising of: i) attribute-weights denoting the significance of individual attributes and ii) value weights representing the importance of attribute values. Example: The user (U) moves to Google for an internship and asks a different query (say Q): Make = Nokia AND MODEL = NSeries. We can presume (since he has procured an internship) that he may be willing to pay a slightly higher price, and hence would prefer Mobiles with Condition = NEW AND Price < 100,000 to be ranked higher than others.
Emphasizes that the same user may display different ranking preferences for the results of different queries. Thus, it is evident that in the context of Web databases, where a large set of queries given by varied classes of users is involved, the corresponding results should be ranked in a user- and query-dependent manner.
II. RELATED WORK

Although there was no notion of ranking in traditional databases, it has existed in the context of information retrieval for quite some time. With the advent of the Web, ranking gained prominence due to the volume of information being searched/ browsed. Currently, ranking has become ubiquitous and is used in document retrieval systems, recommender systems, Web search/browsing, and traditional databases as well. Ranking in Recommendation Systems: Given the notion of user- and querysimilarity, it appears that proposal is similar to the techniques of collaborative and content filtering [3] used in recommendation systems. However, there are some important differences (between banking tuples for database queries versus recommending items in a specific order) that distinguish our work. For instance, each cell in the user-item matrix of recommendation systems represents a single scalar value that indicates the rating/ preference of a particular user towards a specific item. Similarly, in the context of recommendations for social tagging [2], each cell in the corresponding userURL/item-tag matrix indicates the presence or absence of a tag provided by a user for a given URL/item. In contrast, each cell in the userquery matrix (used for database ranking) contains an ordered set of tuples (represented by a ranking function). Further, although the
rating/relevance given to each tuple (in the results of a given query) by a user can be considered to be similar to a rating given for an item in recommendation systems, if the same tuple occurs in the results of distinct queries, it may receive different ratings from the same user. This aspect of the same item receiving varied ratings by the same user in different contexts is not addressed by current recommendations systems. Notion of Similarity is another important distinction apart from recommendation system. In content filtering, the similarity between items is established either using a domain expert, or user profiles, or by using a feature recognition algorithm [4] over the different features of an item (e.g., author and publisher of a book, director and actor in a movie, etc.). In contrast, since our framework requires establishing similarity between actual SQL queries (instead of simple keyword queries), the direct application of these techniques does not seem to be appropriate. To the best of our knowledge, a model for establishing similarity between database queries (expressed in SQL) has not received attention. In addition, a user profile is unlikely to reveal the kind of queries a user might be interested in. Further, since we assume that the same user may have different preferences for different queries, capturing this information via profiles will not be a suitable alternative. The notion of user similarity used in the framework of ranking model is identical to the one adopted in collaborative filtering; however, the technique used for determining this similarity is different. In collaborative filtering, users are compared based on the ratings given to individual items (i.e., if two
users have given a positive/negative rating for the same items, then the two users are similar). The user similarity based on the similarity between their respective ranking functions, and hence ranked orders. Furthermore, this work extends user-personalization using context information based on user and query similarity instead of static profiles and data analysis. Ranking in Databases: Although ranking query results for relational and Web databases has received significant attention over the past years, simultaneous support for automated user- and query-dependent ranking has not been addressed in this context. For instance, address the problem of query dependent ranking by adapting the vector model from information retrieval, whereas does the same by adapting the probabilistic model. However, for a given query, these techniques provide the same ordering of tuples across all users.
across the database tuples, without posing any specific query, from which a global ordering is obtained for each user. A drawback in all these works is that they do not consider that the same user may have varied ranking preferences for different queries. The closest form of query- and userdependent ranking in relational databases involves manual specification of the ranking function/preferences as part of SQL queries. However, this technique is unsuitable for Web users who are not proficient with query languages and ranking functions. In contrast, our framework provides an automated queryas well as user-dependent ranking solution without requiring users to possess knowledge about query languages, data models and ranking mechanisms.
Fig 2: Workload Generation for Similarity- based Ranking
III.PROBLEM DEFINITION
Existing system is a novel query- and user-dependent approach for ranking query results in Web databases. They present a ranking model, based on two complementary notions of user and query similarity, to derive a ranking function for a given user query. The problem in the existing system would be to
Fig 1: Architecture Employing user personalizations by considering the context and profiles of users for user-dependent ranking in databases has been proposed in. Similarly, the work proposed in requires the user to specify an ordering
combine the idea of user similarity proposed in the work with existing user profiles to analyze if ranking quality can be improved. Accommodating range queries, usage of functional dependencies and attribute correlations needs to be examined. Applicability of this model for other domains and applications also need. Performance comparison based on response time and resource utilization of storing, updating and retrieving the data from different database. To determine which databases is suitable for particular web application. We propose a user- and querydependent approach for ranking query results of web databases.We develop a ranking model, based on two complementary measures of query similarity with functional dependency and user similarity with user profile, to derive functions from a workload containing ranking functions for several user-query pairs.
Sort(Q1, Q2, QM) Select QK set i.e, top-K queries from the above sorted set STEP 2: for r = 1 to N do %%Using Equation 2%% Calculate User Similarity (Ut, Ur) with User profile over QKset end for %% Based on descending order of similarity with Ut %% SortU1, U2, UN) to yield Uset STEP 3: for Each Qs QKset do for Each Ut Uset do Rank(Ut, Qs) = Rank(Ut Uset) + Rank(Qs QKset) end for end for Fxy = Get-Ranking Function() Te = time(); %%end time (current time)%% Response time Equation3%% = Te Ts;%% Using
ALGORITHM:
INPUT: Ui, Qj, Workload W (M queries, N users ), Ts, Te OUTPUT: Ranking Fun Fxy to be used for Ui, Qj, Response time STEP 1: Ts = time(); %%Start time (current time)%% for p = 1 to M do %% Using Equation 1%% Calculate Query Condition Similarity(Qj, Qp) with functional dependencies end for %% Based on descending order of similarity with Qj%%
Initially the algorithm is used to search the function in the workload W if Ranking Function Fxy to be used for Ui , Qj. Ui is the no of user uses the web application and Qj is the different no of query that can ask by the user. To Calculate Query Similarity (Qj,Qp) with functional dependencies. It uses the Qj and Qp, Qj is the one set of query and Qp is
the another set of query we uses the function to calculate the similarity between Qj,Qp. Similarities of the query have sorted in the order, top-K queries from the sorted set. To Calculate User Similarity (Ut, Ur) over QKset. Ut is the one set of user and Ur is the another set of user we uses the function to calculate the similarity between Ut , Ur. Similarities of the user have sorted in the order.then the Ranking Function is calculated with user set and query set. These have stored in the workload.
ranking preferences for different query results with functional dependencies. Consequently, a randomly selected function from U1s workload is not likely to give a desirable ranking order over N1. On the other hand, the ranking functions are likely to be comparable for queries similar to each other. We advance the hypothesis that if Q1 is most similar to query Qy (in U1s workload), U1 would display similar ranking preferences over the results of both queries; thus, the ranking function (F1y) derived for Qy can be used to rank N1. For Web databases, although the workload matrix can be extremely large, it is very sparse as obtaining preferences for large number of user-query pairs is practically difficult. We have purposely shown a dense matrix to make our model easily understandable Table 1 query make Honda Toyota Lexus any location Dallas Atlanta Basin Littlerock price mileage any any any any and any any any any U1s color any any any Grey
IV. SIMILARITY MODEL FOR RANKING

The concept of similarity-based ranking is aimed at situations when the ranking functions are known for a small (and hopefully representative) set of user-query pairs. At the time of answering a query asked by a user, if no ranking function is available for this userquery pair, the proposed query and usersimilarity models can effectively identify a suitable function to rank the corresponding results. 4.1 Query Similarity Dependencies with Functional
1.
For the user U1 from Example, a ranking function does not exist for ranking Q1s results (N1). From the sample workloadA1 shown in Table 1, ranking functions over queries Q2, Q5, Q7 (shown in Table 1) have been derived; thus, forming U1s workload. It would be useful to analyze if any of F12, F15, or F17 can be used for ranking Q1s results for U1. However, from Example, we know that a user is likely to have displayed different
Input query Workload
(Q1)
Similar to recommendation systems, our framework can utilize the an aggregate function, composed from the functions corresponding to the top-k most similar queries to Q1, to rank N1 based on FD and correlated
attributes . Although the results of our experiments showed that an aggregate function works well for certain individual instances of users asking particular queries, on average across all users asking a number of queries, using an individual function proved better than an aggregate function. Hence, for the reminder of the section, we only consider the most similar query (to Q1). We translate this proposal of query similarity into a principled approach via two alternative models: i) query condition similarity, and ii) query-result similarity. The Similarity model (shown in Figure 1) forms the core component of our ranking framework. When the user Ui poses the query Qj, the query-similarity model determines the set of queries ({Qj,Q1,Q2, ...,Qp}) most similar to Qj. Likewise, the user-similarity model determines the set of users ({Ui, U1, U2, ...Ur}) most similar to Ui. Using these two ordered sets of similar queries and users, it searches the workload to identify the function FUxQy such that the combination of Ux and Qy is most similar to Ui and Qj. FUxQy is then used to rank Qjs results for Ui. The workload used in our framework comprises of ranking functions for several user-query pairs. Figure 2 shows the high level view of deriving an individual ranking function for a user-query pair (Ux, Qy). By analyzing Uxs preferences (in terms of a selected set of tuples (Rxy)) over the results (Ny), an approximate ranking function (FUxQy) can be derived. Similarity ]) (1) (Q[ ],Q[
using such ranking functions. Tables 1 show two instances of the workload (represented in the form of a matrix of users and queries). Cell [x,y] in the workload, if defined, consists of the ranking function Fxy for the user-query pair Ux and Qy. 4.2 User Similarity with User Profile Definition Given two users Ui and Uj with the set of common queries {Q1, Q2, ..., Qr},6 for which ranking functions ({Fi1, Fi2, ..., Fir} and {Fj1, Fj2, ..., Fjr}) exist in W, the user similarity between Ui and Uj is expressed as the average similarity between their individual ranking functions for each query Qp
Similarity ( . (2)
2.
Without loss of generality, we assume {Q1, Q2, ..., Qr} are the common queries for Ui and Uj , although they can be any queries Table 2
For Query sim( sim( sim(
User Similarity ) > sim( ) < sim( ) < sim(
Drawbacks of Query-Independent User Similarity
Where wi represents the attribute-weight of Ai and vi represents the value-weight for Ais value in tuple t. The workload W is populated Organized by: Department of Computer Science and Engineering, Anand Institute of Higher Technology, Chennai. www.iconic12.in E-mail: iconic.aiht@gmail.com Page 258
This method for estimating user similarity with user profile, we have considered all the queries that are common to a given pair of users and relative keywords. This assumption forms one of our models for user similarity termed query-independent user similarity. However, it might be useful to estimate user similarity based on only those queries that are similar to the input query Q1. In other words, in this hypothesis, two users who may not be very similar to each other over the entire workload comprising of similar and dissimilar queries, may in fact, be very similar to each other over a smaller set of similar queries. We formalize this hypothesis using model top-K clustering for determining user similarity. 4.3 Performance Traditionally, response time and throughput are the two most important performance metrics for both Web information systems and database systems [7]. We adopt the following two metrics as our response time metric: Ts and Te where Ts is the start time for execute a query and Te is the end time for execute the query. It is useful to measure the Performance Response Time = Te Ts .. (3) 4.4 Top-K User Similarity For finding the value of k for clustering the following Top-k similarity can be used. Strict top-K user similarity: Given an input query Q1 by U1, only the top-K most similar queries to Q1 are selected. However, the model does not check the presence (or absence) of ranking functions in the workload for these K queries. Using K = 3, Q2, Q3 and
Q4 are the three queries most similar to Q1, and hence, would be selected by this model. In the case of workload-A, similarity between U1 and U2 as well as between U1 and U3 will be estimated using Q2. However, in the case of workload-B, similar to the problem in the clustering alternative, there is no query common between U1 and U2 (as well as U3). Consequently, similarity cannot be established and hence, no ranking is possible. Here Workload A & B or the collection of different set of user function Similarity F12 & F13). User-based top-K user similarity: In this model, we calculate user similarity for a given query Q1 by U1, by selecting top- K most similar queries to Q1, each of which has a ranking function for U1. Consequently for workload-A, using k = 3, the queries Q2, Q5 and Q7 would be selected. Likewise, in case of Workload-B, this measure would select Q3, Q5 and Q6 using the top-3 selection. However, since there exist no function for users U2 and U3 (in workload-B) given these queries, no similarity can be determined, and consequently, no ranking would be possible. Workload-based top-K user similarity: In order to address the problems in previous two models, we propose a workload based top-K model that provides the stability of the query independent model (in terms of ensuring that ranking is always possible, assuming there is at least one non-empty cell in the workload for that user) and ensures that similarity between users can be computed in a query-dependent manner. Given a Q1 by U1, the top-K most similar queries to Q1 are selected such that for each of these queries there exists: i) a ranking function for U1 in the workload, and ii) a
ranking function for at least one other user (Ui) in the workload. Considering k = 3, this model will select Q2, Q5 and Q7 in the case of workload-A and the queries Q7 and Q8 for workload-B, and ensure a ranking of results in every case.
Data Retrieval. PhD thesis, University of Illinois, Urbana Champaign, 2005. [4]. A. Telang, S. Chakravarthy, and C. Li. Establishing a workload for ranking in web databases. Technical report, UT Arlington, 2010. [5]. A Performance Comparison of Various Open Source Databases, Rockdale Magnet School for Science and Technology, Book @2008 [7]. Yi Li, Kevin L, Performance Issues of a Web Database, School of Computing, Information Systems and Mathematics, South Bank University, 103 Borough Road, London, IEEE 2005. [8]. Weifeng Su, Jiying Wang, and Frederick H. Lochovsky, Record Matching over Query Results from Multiple Web Databases, IEEE APRIL 2010 [9]. T. Kanungo and D. Mount,An efficient kmeans clustering algorithm: Analysis and implementation. IEEE Transactions of Pattern Analysis in Machine Intelligence, 24(7):881 892, 2002.
V. CONCLUSION
In this paper, we proposed a user- and query-dependent solution for ranking query results with performance comparison for Web databases. Ranking model: User similarity with user profile and Query similarity with Functional dependency and correlation attribute are increase the performance of response time and reduce the storage space of the database
VI. REFERENCES
[1]. Aditya Telang, Chengkai Li, Sharma Chakravarthy, One Size Does Not Fit All: Towards User- and Query-Dependent Ranking For Web Databases, Department of Computer Science and Engineering, University of Texas at Arlington, 2011 IEEE. [2]. S. Agrawal, S. Chaudhuri, G. Das, and A. Gionis. Automated ranking of database query results. In CIDR, 2003. [3]. S.-W. Hwang. Supporting Ranking For
EVOLUTION AND EMERGING TRENDS ON INTERNET OF THINGS

V. Subashini1, A. Umamaheswari2, P. Subhapriya3, Prof. K. PremKumar4
4
Abstract
PG Students, CSE,Sri Manakula Vinayagar Engineering College, Puducherry. HOD, Department of CSE, Sri Manakula Vinayagar Engineering College, Puducherry
1, 2, 3
We are ranking on the edge of a new ubiquitous computing and communication era, which solitarily transform our corporate, community, and personal spheres for the next emergent of Internet. Consequently, the aspects of future Internet resolves come into view from the advancement series of critical technologies, converging and increasing in parallel, through the market, business and societal evolutions. These demands are driven from the user control of content and services, which increases the network of "things". By the way, of advent of the Internet of Things (IoT) that offers capabilities to identify and connect worldwide physical objects otherwise known as physical computing into a unified system. Therefore, the importance of modeling and processing IoT data has become extensively heighten. IoT data is considerable in quantity, noisy, heterogeneous, incoherent, and arrives at the system in a tributary fashion. Due to the distinctive characteristics of IoT data, the operation of IoT data for realistic applications has come across many essential challenging problems, such as data modeling and processing. This paper proposes the evolution and emerging trends for all the aspects of internet object existing in the environment n detail.
connectivity and accessibility of everything. The Internet of Things is a technological revolution that represents the future of computing and communications, and its development depends on self-motivated technical innovation in a number of important fields, from wireless sensors to nanotechnology. Initially, to sort out everyday objects and devices, which are connected in large databases and networks, indeed to connect network of networks, i.e., the internet, consequently a simple, unassuming and costeffective system of item identification is critical. Then, the data about things and objects
I.
INTRODUCTION
The semantic web origin look composed by two words and concepts: Internet and Thing, where Internet can be defined as The world-wide network of unified computer networks, based on a standard communication protocol, the Internet suite, while Thing is a physical object not precisely identifiable. Therefore, semantically, Internet of Things means a world-wide network of interconnected objects uniquely addressable, based on standard communication protocols.[1] New innovative applications will emerge from this, interconnected objects, social and technological framework utilizing the
can be collected and processed. Radiofrequency identification (RFID) put forward this functionality. Next, by using the sensor technology, the data that are collected will promote the capability to identify changes in the physical positioning of the things. Embedded intelligence in the things themselves can further improve the power of the network by devolve information-processing capability to the edges of the network [2]. Finally, advances in miniaturization and nanotechnology mean that smaller and smaller things will have the capability to interact and connect. A combination of all of these developments will build an Internet of Things that connects the worlds objects in both a sensory and an intelligent manner. RFID technologies use radio waves to identify items, which are observed as one of the fundamental facilitate for the Internet of Things. IoT offers better possibilities for data processing and increase the flexibility of the network. This will also authorize things and devices at the edges of the network to take independent decisions. Smart things are difficult to define, but involve a certain processing power and reaction to external motivation [2]. Advances in smart homes, smart vehicles and personal robotics are some of the leading areas of IoT. Study on wearable computing i.e., including wearable mobility vehicles are swiftly progressing. In this new era, scientists are using their imagination to build up new devices and machine, such as intelligent ovens that can be controlled through phones or the internet, online refrigerators and networked blinds.
Fig 1: Smart home for smart people The Internet of Things will depict on the functionality offered by these technologies to understand the vision of a fully interactive and responsive network environment. The IoT is different from sensor networks, which monitor things but do not control them. Both connect everyday objects and sensor networks both control a common set of technological advances toward miniature, power-efficient sensing, processing, and wireless communication. There are two different modes of communication in the Internet of Things: thing-to-person and thing-to-thing communication [3]. Thing-to-person communications covers a number of technologies and applications in which people interrelate with things and vice versa, including remote access to objects by humans, and objects (sometimes called blogjects) that continuously report their status, whereabouts, and sensor data [3]. Thing-to-thing communications covers technologies and applications in which everyday objects and infrastructure interact with no human originator, recipient, or intermediary. An object monitors other objects, take remedial actions, and inform or prompt
humans as required. Machine-to-machine communication is a detachment of thing-tothing communication; but machine-to-machine communication often survive within a largescale IT systems and so includes things that may not meet the criteria as everyday objects.[3] The vision of Future Internet is based on the standard communication protocols that reflect on the merging of computer networks, Internet of Things (IoT), Internet of People (IoP), Internet of Energy (IoE), Internet of Media (IoM), and Internet of Services (IoS), into a common worldwide IT platform of seamless networks and networked smart [4] things/objects.
market, business and societal evolutions. The research in Future Internet focuses on defeating the long-standing limitations of current Internet facilities, architecture and protocols through work of examining nature to address how various classes of new requirements limit the predictable evolution of the Internet and identify corresponding longstanding solutions [5]. The objective of IoT is the integration and unification of all communication systems that are surrounded by us. IoT is balanced by the application of artificial intelligence, to learn user behavior patterns, gain knowledge of the context, define action rules for each scenario in relation with the users behavior etc. The Internet of things is to define services for the assistance of peoples ambient intelligence and particularly when dealing with healthcare of elderly and disabled people is Ambient Assisted Living (AAL).
Fig 2: Internet of things: 6A Connectivity The Internet of Things allows people and things to be connected Anytime, Anywhere, with Anything and Anyone, if possible using Any path/network and Any service. This is stated as in the ITU vision of the IoT, according to which: From anytime, anyplace connectivity for anyone, we will now have connectivity for anything. [2] The Future Internet will move towards the advances in a range of vital technologies, uniting and developing in parallel, and from
Fig 3: The relationship between IoT and other existing network IoT evolves from the Internet and shortrange communication network, Fig 3 describes
the relationship between IoT and other existing networks [6]. Internet of characteristics: Things has three main
Comprehensive sense: Using RFID, sensors, two dimension codes are used to collect information of objects anytime, anywhere. Reliable transmission: Accurate realtime delivering information of objects can be obtained through interconnecting a variety of telecommunication networks and Internet. Intelligent processing: Using intelligent computing such as cloud computing and fuzzy identification to analyze and process vast amounts of data and information through search keywords, for the purpose to retrieval of information and for implementation of intelligent control to objects. The Internet of Things is a vision that encompasses and overcomes several technologies for coming together through the various fields such as Nanotechnology, Biotechnology, Information Technology and Cognitive Sciences. Over the next 10 to 15 years, the Internet of Things is likely to develop fast and shape a newer "information society" and "knowledge economy", but the direction and pace with which developments will occur are difficult to predict. [7]
II.
these networks, and many others will be connected with additional security, analytics, and management capabilities. By this IoT will become more powerful in what it can help people to attain. Through IoT, history is repeating itself, although on a much grander scale.
Fig 4: IoT Can Be Viewed as a Network of Networks

III.
WHY IS IOT IMPORTANT?
IOT AS A NETWORK OF NETWORKS
At present, IoT is built up of a free set of dissimilar and purpose-built networks. For example, nowadays car has several networks to control the engine function, safety features, communications systems, and so on [8]. Residential buildings also have several control systems for heating, venting and airconditioning (HVAC); telephone services, security and lighting. While IoT develops,
Prior to see the importance of IoT, it is first essential to identify with the differences between the Internet and the World Wide Web. The Internet is the physical layer or network made up of switches, routers, and other equipments [8]. Its main function is to transport information from one point to another quickly, reliably, and securely. The web, on the other hand, is an application layer that functions on top of the Internet. Its main role is to provide an interface that makes the information flow across the Internet usable.
IV.
EVOLUTION OF THE WEB THE INTERNET
VS.
The web has gone through quite a lot of distinct evolutionary stages: [8] Stage 1: First was the research phase, when the web was called the Advanced Research Projects Agency Network (ARPANET).
During this time, the academic world used web for research purpose. Stage 2: The second phase of the web can be creating brochureware. Described by the domain name gold rush, this stage focused on the condition for almost every company to deal out information on the Internet so that people could learn about products and services. Stage 3: The third evolution stimulated the web from static data to transactional information, where products and services could be bought and sold, and services could be delivered. During this phase, companies like eBay and Amazon.com busted out on the scene. This phase also called disreputably as the dot-com boom and bust. Stage 4: The fourth stage, where we are now, is the social or experience web, where companies like Facebook, Twitter, and Groupon have become massively fashionable and valuable by allowing people to communicate, connect, and share information like text, photos, and video about themselves with friends, family, and colleagues [3].
Fig 5: The Evolution of IoT

V.
CHARACTERISTICS OF IOT
Communication: The communication protocols will be designed for Web oriented architectures of the Internet of Things platform where all objects, wireless devices, cameras, PCs etc. are united to examine location, intent and even emotions over a network [1][7]. New methods of successfully managing power utilization at different levels of the network design are needed, from network routing down to the architecture of individual devices. Integration: Integration of smart devices into packaging will allow a considerable cost saving and boost the eco-friendliness of the products [1][7]. System-in-Package (SiP) technology permits flexible and 3D integration of different elements such as antennas, sensors, active and passive components into packaging, humanizing performance and dropping the tag cost. Interoperability: It is a known truth that two different devices might not be interoperable, even if they follow the same standard. This is the main showstopper for ample acceptance of IoT technologies [1][7]. Future tags must integrate different communication standards and protocols, which operates at different frequencies and allow different architectures, centralized or distributed, and be able to communicate with other networks except global, well-defined standards emerge. Standards: Open standards are the solution for the success of the Internet of Things, as it is for
any kind of machine-to-machine communication. Without clear and recognized standards such as the TCP5/IP6 in the Internet world, the growth of the Internet of Things beyond RFID solutions cannot reach a worldwide scale [1]. The unique addresses follow two standards today, Ubiquitous ID and EPC7 Global, and there is somewhat a big conflict in the frequencies used according to the country and the manufacturer. Privacy centered and secure communication standards that are globally accepted are well suited and identical protocols are needed for different frequencies. Manufacturing: Cost, reduced to one cent per tag and production must reach tremendously high volumes, while the whole production process must have a very limited impact on the environment [1].
VI.
readers can identify multiple objects concurrently [9]. In addition, some RFID tagreader architectures support security features such as requiring a human operator to input a challenge code before decoding an ID.
VII.
THE TECHNOLOGIES OF IOT
RFID Technology: RFID uses electromagnetic induction or electromagnetic propagation for the intention of noncontact automatic identification of objects or humans [6][9]. In wireless condition, the allied EEPROM in the electronic label can be written and read by the expert read-write equipment. RFID has the characteristic beside reproduction, combined encryption, and securer data on ticket, document, certificate, and items, for the use of anti-counterfeiting. Sensor Network: Wireless sensor networks afford a new platform to sense the world and to process the data. It is widely useful in lots of fields, such as military defense, industry control, agriculture control, city management, biomedicine, rushing to deal with an emergency, rescue, remote control to danger zone, manufacture, et al [6]. Wireless sensor networks are consists of a large number of nodes, which is competent of communicating, computing and cooperation in adhoc model. Smart Technology: Smart technologies are the techniques employed to attain certain purpose by using a priori knowledge. Objects, which get intelligent after the implantation of smart technologies, can interact with users actively or passively [6]. Content and direction of major research includes, artificial intelligence theory, advanced human-to-machine interaction technologies and systems, intelligent control
THE BUILDING BLOCKS OF IO T
Advancement in the following technologies will add to the growth of the IoT: [3] Machine-to-machine interfaces and protocols of electronic communication set the rules of commitment for two or more nodes on a network. Microcontrollers are computer chips have been designed to embed into objects other than computers. Wireless communication is a well known to most people in the urbanized world. Several different wireless technologies have possible to play significant roles in the IoT with short-range and long-range channels, as well as bidirectional and unidirectional channels. RFID technology is similar to an electronic barcode that a reader device can detect even without line of sight. Some RFID
technology and systems, intelligent signal processing. Nanotechnology: Nanotechnology - the study of tiny particles is being used to develop products across a number of industries, including medicine, energy and transportation. The use of nanotechnology means that objects that interrelate and connect with each other can be the smaller ones [6].
VIII.
Internet of Things will progressively develop from the function of information aggregation to collaborative awareness and ubiquitous convergence. Not all services of the IoT will be able to build up to the stage of ubiquitous convergence. Many applications and services need only information aggregation, and are not proposed for ubiquitous convergence, as the information is closed, confidential, and valid only to a small group. Identity-Related Services: Identity-related services agree to identity technologies such as RFID, two-dimensional code, and barcode. Based on the identification mode of the terminal, identity-related services are sub-divided into two categories: active and passive. Served objects (enterprise or individual) can also classify them as personal applications or enterprise services [11].
USE OF CLOUD-BASED INTERNET OF THINGS
Cloud computing makes it possible for the devices even with partial computational capabilities, perform complex computations required for effective performance of the assigned task [10]. The things need to have only the sensors, the actuators, and their decisionmaking capability that can facilitate through almost infinite computational capabilities of the cloud. However in the current form, IoT are lacking in managing service invocation that sensor triggers with the happening of an event [13]. The communication between events and services are absent in the current IoT clouds. Moreover, authorized access to IoT cloud sensor data and services without soothing user privacy is also a demanding task.
IX.
Fig 6: Applications of identity related services Information Aggregation Services: At present, there is no communication between events and services in the current IoT networks. Moreover, the information sensed by certain sensor may not be functional to the other unless processed to a widely suitable standard. With information aggregation services, a terminal collects and processes data, and reports it via the communication network to the platform. The platform further processes
SERVICES IN CLOUD FOR IOT
Substantial numbers of applications are available which can be included as Internet of Things services, and these are classified according to different criterion. Based on the technical features, Internet of Things services are divided into four types: [10] identity-related services, information aggregation services, collaborative-aware services, and ubiquitous services [11].
the data and implements unified management of the terminals, data, applications, and services as well as third parties. Specific service applications include repeated meter reading, elevator management, health vital parameters monitoring logistics, and traffic management, hotel management, energy management etc. Fig 7 exemplifies the framework of information aggregation services [12]. In this, the entire system is built of Machine-to-Machine (M2M) terminals, a communication network, platforms, applications, and operation systems. The mobile communication network operates as an information transmission carrier, transmitting data in several ways. In fact, a fixed network is used as the data transmission channel [10].
Ubiquitous Services: Characterized by omnipresence, allinclusiveness, and omnipotence, ubiquitous services intend to distribute smooth communication anytime, anywhere, for anybody, and for everything. They are the culmination of communication services, followed by human society [10]. An obvious way ahead for the future ubiquitous services is the use of internet cloud, which is an emerging as a universal bus, as their carrier, and it would represent a significant leap in the development of the Internet of Things, and an important development stage of Internet. Addition of the information of real-world things into the cloud would facilitate the sharing of things and the processing capacity of the clouds by more users through the Internet.
X.
CONCLUSION
Fig 7: Information Aggregation Services Collaborative-Aware Services: The growth of the Internet of Things should lead us to the delivery of more important and complex services. Such services require thingto-thing and thing-to-person communication [10]. Moreover, these communication capacities impose higher necessities on reliability and delay, and require things to be smarter.
An internet application is a form of sharing information based on web resources and its services, in which the Internet of Things is a special application of semantic web. The web services and its technologies provide as a service to the consumer through various forms by the use of Web Service Description Language. The interaction is mainly between the provider and the consumer such as humanto-human, human-to-thing, and thing-to-thing. Subsequently, Internet of Things will be a main traffic maker and, peoples life is coming on from the Internet of Things. Therefore, this paper focuses on the emerging Internet of Things phenomenon.
REFERENCES
[1] Internet of Things in 2020: Road Map for the future, Infso D.4 Networked
Enterprise & Rfid Infso G.2 Micro & Nanosystems in Co-operation with the working Ggroup RFID of the Etp Eposs Version 1.1-27 May 2008. [2] The Internet of Things Executive Summary, ITU Internet Reports 2005, November 2005. [3] The Internet of Things: Background, Disruptive Technologies Global Trends 2025, SRI Consulting Business Intelligence. [4]Internet of Things Strategic Research Roadmap,Dr. Ovidiu Vermesan, Dr. Peter Friess, Patrick Guillemin,Sergio Gusmeroli, Harald Sundmaeker, Dr. Alessandro Bassi, Ignacio Soler Jubert, Dr. Margaretha Mazura, Dr. Mark Harrison, Dr. Markus Eisenhauer, Dr. Pat Doody. [5] Keynote Speaker, Paulo T. de Sousa, Head of Sector, Internet of the Future Directorate-General Information Society European Commission. [6] Research on the Architecture and Key Technology of Internet of Things (loT) Applied on Smart Grid,Miao Yun, Bu Yuxin Wuhan University, Wuhan, Hubei, China, ICAEE 2010.
[7] Research on the Key Technologies of IOT Applied on Smart Grid Wu Shu-wen School of Electrical Engineering, Wuhan University Wuhan, China, IEEE 2011. [8] The Internet of Things: How the Next Evolution of the Internet Is Changing Everything, Dave Evans, CISCO White Paper, April 2011. [9] Privacy implications of RFID: an assessment of threats and opportunities, Marc van Lieshout and Linda Kool, IFIP International Federation for Information Processing-2008, Volume 262. [10] From Internet of Things towards Cloud of Things, Mrs. Pritee Parwekar, International Conference on Computer & Communication Technology (ICCCT) 2011. [11]China Mobile, IoT Evolution Blueprint [R] 2009. [12] Platform Technology for the Internet of Things, ZTE Corporation- White Paper, 2009. [13] Event-driven sensor virtualization approach for Internet of Things Cloud, Sarfraz Alam, Mohammad,M.R.Chowdhury, Josef Noll SenaaS.
DISTRIBUTION BASED GEOGRAPHIC MULTICASTING OVER MANET

Dr.S.P.Balakannan 1, R.Aishwarya 2, Induja.N 3
1
Asst.Professor,Department of IT, Anand Institute of Higher Technology,Chennai UG Student,Department of IT, Anand Institute of Higher Technology,Chennai
2,3
ABSTRACT
Multicast is an efficient method for implementing group communications.Due to the dynamic topology involved in MANET ,it is difficult to manage group members and it is less efficient and scalable. We propose a novel Efficient group distribution multicast protocol (EGDMP). EGDMP uses a virtual-zone-based structure to implement scalable and efficient group membership management. A networkwide zone-based bidirectional tree is constructed to achieve more efficient membership management and multicast delivery.The position information is used to guide the zone structure building, multicast tree construction,and multicast packet forwarding, which efficiently reduces the overhead for route searching and tree structure maintenance. Several strategies have been proposed to further improve the efficiency of the protocol, for example, introducing the concept of zone depth for building an optimal tree structure and integrating the location search of group members with the hierarchical group membership management. Finally, we design a scheme to handle empty zone problem faced by most routing protocols using a zone structure. Compared to Scalable Position-Based Multicast (SPBM), EGDMP has significantly lower control overhead, data transmission overhead, and multicast group joining delay.The performance level never decreases even if the group size and the network size increases.
Keywords: MANET,ODMRP,SPBM,EGDMP I.INTRODUCTION

Multicast is an efficient method to realize group communications. However, there is a big challenge in enabling efficient multicasting over a MANET whose topology may change constantly. The existing geographic routing protocols generally assume mobile nodes are aware of their own positions through certain positioning sstem (e.g., GPS), and a source can obtain the destination position through some type of location service . In an intermediate node makes its forwarding decisions based on the destination position inserted in the packet header by the source and the positions of its one-hop neighbors learned from the periodic beaconing of the neighbors. By default, the
packets are greedily forwarded to the neighbor that allows for the greatest geographic progress to the destination.To reduce the topology maintenance overhead and support more reliable multicasting, an option is to make use of the position information to guide multicast routing. For example, in unicast geographic routing, the destination position is carried in the packet header to guide the packet forwarding, while in multicast routing, the destination is a group of members. A straightforward way to extend the geographybased transmission from unicast to multicast is to put the addresses and positions of all the members into the packet header, however, the header overhead will increase significantly as the group size increases, which constrains the application of geographic multicasting only to
a small group . The existing small-group-based geographic multicast protocols normally address only part of these problems.In this work, we propose an efficient group distribution multicast protocol, EGDMP, which can scale to a large group size and large network size. The protocol is designed to be comprehensive and self-contained, yet simple and efficient for more reliable operation.The zone structure is formed virtually and the zone where a node is located can be calculated based on the position of the node and a reference origin.
III. EFFICIENT GROUP DISTRIBUTION MULTICAST PROTOCOL

A.Zone Leader Election And Leader Table Generation To elect a leader with minimum overhead ,each node broadcasts Beacon signal periodically to distribute its position. EGDMP inserts a flag in the BEACON message, indicating whether the sender is a zone leader or not.To reduce beaconing overhead instead of using fixed interval beaconing ,adaptive beaconing is used.A non leader node will send a beacon for every period of Intvalmax.The zone leader has to send beacon for every period of Intvalmin to announce its leadership role.When receiving a beacon from a neighbor ,a node records the node id, position, and flags.An entry will be removed if not refreshed within a period Timeout (or) the corresponding neighbor is detected unreachable by the MAC layer protocol. B.Zone Supported Gepgraphic Forwarding With a zone structure, the communication process includes an intrazone transmission and an interzone transmission. In our zone structure, as nodes from the same zone are within each others transmission range and are aware of each others location, only one transmission is required for intrazone communications. As the source and the destination may be multiple hops away,to ensure reliable transmissions,geographic unicasting is used with the packet forwarding guided by the destination position. In EGDMP, to avoid the overhead in tracking the exact locations of a potentially large number of group members, locationservice is integrated with zone-based membership management without the need of an external location server. At the network tier, only the ID of the destination zone is needed. A packet is forwarded toward the center of the destination zone first. After arriving at the destination zone, the packet will be forwarded to a specific receiving node or broadcast depending on the
II.PROBLEM STATEMENT
A. System Model: Representative network architecture for MANET is showed below.Three different network entities can be identified as follows: SourceHome: SourceHome provides the table containing leader information of each zone along with its group identification.It keeps track of the number of counts of nodes available under each group. Leader: Leader keeps track of its own information along with its subnodes details. Subnode: Each subnode maintains the details regarding of its leader node.The message communication can be done among the nodes.
FIG 1.System Architecture
message type.Generally, the messages related to multicast group membership management and multicast data will be forwarded to the zone leader to process. C.Multicast Tree Construction In EGDMP, instead of connecting each group member directly to the tree, the tree is formed in the granularity of zone with the guidance of location information,which significantly reduces the tree management overhead. When a multicast session G is initiated, the first source node S announ ces the existence of G by flooding a message NEW SESSION into the whole network. When a node M receives this message and is interested in G, it will join G.A multicast group member will keep a membership table (G; root zID; isAcked),where G is a group of which the node is a member, root_zID is the root-zone ID, and isAcked is a flag indicating whether the node is on the corresponding multicast tree. A zone leader (zLdr) maintains a multicast table. When a zLdr receives the NEW_ SESSION message, it will record the group ID and the root-zone ID in its multicast table.The table contains the group ID, root-zone ID, upstream zone ID, downstream zone list, and downstream node list. To end a session G, S floods a message END_SESSION(G). When receiving this message, the nodes will remove all the information about G from their membership tables and multicast tables.When a node receives NEW_SESSION message it will join the group JOIN_REQ(M,Pos,G,) message to its ZLdr.If the ZLdr receives JOIN_REQ message from the member of the same zone, it adds it to the downstream node list of its multicast table.If the ZLdr receives JOIN_REQ from the member of another zone ,it will compare the depth of the requesting zone and its zone.The leader sends back the JOIN_REPLY message to the requesting node.When a member wants to leave a group,it sends a LEAVE(M,G)message to its ZLdr.On receiving a LEAVE message,the leader removes the node from its downstream node
list(Intra zone node) or zone list(Inter zone node) D.Multicast Packet Delivery After the multicast tree is constructed, all the sources of the group could send packets to the tree and the packets will be forwarded along the tree. The sending of packets to the root would introduce extra delay especially when a source is far away from the root. To avoid this delay EGDMP assumes a bidirectionaltreebased forwarding strategy , with which the multicast packets can flow not only from an upstream node/zone down to its downstream nodes/zones, but also from a downstream node/zone up to its upstream node/zone. When a source S has data to send and it is not a leader, it checks the isAcked flag in its membership table to find out if it is on the tree. If it is on the tree, it sends the multicast packets to its leader. When the leader of an ontree zone receives multicast packets, it forwards the packets to its upstream zone and all its downstream nodes and zones except the incoming one. When a source node S is not on the multicast tree, for example, when it moves to a new zone, the isAcked flag will remain unset until it finishes the rejoining to G through the leader of the new zone.When a node N has a multicast packet to forward to a list of destinations(D1,D2..),it decides the next hop node towards each destination.An ex.,list is(N1:D1,D3), (N2:D2,D4) ,Where N1 is the next hop node for the destination D1 & D3,N2 is the next hop node for the destination D2 & D4.
Packet delivery ratio
20 0
Performance chart
EGDMP SPBM
IV.CONCLUSION:
There is an increasing demand and a big challenge to design more scalable and reliable multicast protocol over a dynamic ad hoc network (MANET). In this paper, we propose an efficient and scalable geographic multicast protocol, EGDMP, for MANET. The scalability of EGDMP is achieved through a two-tier virtual-zone-based structure,which takes advantage of the geometric information to greatly simplify the zone management and packet forwarding.A zone-based bidirectional multicast tree is built at the upper tier for more efficient multicast membership management and data delivery, while the intrazone management is performed at the lower tier to realize the local membership management.The use of location information in EGDMP significantly reduces the tree construction and maintenance overhead, and enables quicker tree structure adaptation to the network topology change. We also develop a scheme to handle the empty zone problem,which is challenging for the zone-based protocols. Additionally,EGDMP makes use of geographic forwarding for reliable packet transmissions, and efficiently tracks the positions of multicast group members without resorting to an external location server.Compared[3] to the classical protocol ODMRP, both geometric multicast protocols SPBM and EGDMP could achieve much higher delivery ratio in all circumstances, with respect to the variation of mobility,node density, group size, and network range. However,compared to EGDMP, SPBM incurs several times of control overhead, redundant packet transmissions, and multicast group joining delay. Our results indicate that geometric information can be used to more efficiently construct and maintain multicast structure, and to achieve more scalable and reliable multicast transmissions[6] in the presence of constant topology change of MANET. Our simulation results demonstrate that EGDMP has high packet delivery ratio, and low control overhead and multicast group joining delay under all
cases studied, and is scalable to both the group size and the network size.Compared to the geographic multicast protocol SPBM, it has significantly lower control overhead, data transmission overhead, and multicast group joining delay.
V.REFERENCES:
[1]Supporting Efficient and Scalable Multicasting over Mobile Ad Hoc Networks X. Xiang, Member, IEEE, X. Wang, Member, IEEE, and Y. Yang, Fellow, IEEE [2]H. Kopka and P.W. Daly, A Guide to LaTeX, third ed. AddisonWesley, 1999. [3] L. Ji and M.S. Corson, Differential Destination Multicast: A MANET Multicast Routing Protocol for Small Groups, Proc. IEEE INFOCOM, Apr. 2001.
EFFICIENT KEYWORD BASED SEARCH IN RELATIONAL DATABASE

1
R.Suresh1, K. Saranya2, S. Dhivya3, K.Thilagapriya4 Assistant Professor ,Department of IT, Sri Manakula Vinayagar Engineering College,Puducherry 2,3,4 Students,Department of IT, Sri Manakula Vinayagar Engineering College,Puducherry
Abstract
Information retrieval (IR) is the process of finding documents relevant to an information need from a large document set. The integration of DB and IR provides flexible ways for users to query information in the same platform. Keyword search is the most popular information retrieval method as users need to know neither a query language nor the underlying structure of the data. In this paper we focus on the keyword query ranking and based on the user preference the top-k results will be retrieved from the relational database and returned to the user. This system will provide an efficient query processing and the ranking functions which is based on the scores calculated by different methods.
I. INTRODUCTION
The integration of DB and IR provides flexible ways for users to query information in the same platform. On one hand, the sophisticated DB facilities provided by RDBMSs assist users to query well-structured information using SQL. On the other hand, IR techniques allow users to search unstructured information using keywords based on scoring and ranking, and do not need users to understand any database schemas. Information retrieval (IR) is the process of finding documents relevant to an information need from a large document set.IR typically deals with crawling, parsing and indexing document, retrieving documents. Many of the techniques used in data mining come from Information retrieval but data mining goes beyond information retrieval. Metrics often used for the evaluation of the IR techniques are: precision and recall. Recall measures the quantity of results returned by a search and precision is the measure of the quality of the results returned. Recall is the ratio of relevant results returned by all relevant results. Precision is the number of relevant results returned by the total number of results returned. An increase in the precision can lower the overall recall value while an increase in the recall value lowers precision value. The searching can be done as phrase based or keyword based. The searching is done for the
data based on the keywords that are given by the user in the query. Then based on the score calculated for the relevancy the ranking of the data will be done. Keyword searching is an important part of the information retrieval. As the amount of available text data in relational databases is growing rapidly, the need for ordinary users to be able to search such information effectively is increasing dramatically. Keyword search is the most popular information retrieval method as users need to know neither a query language nor the underlying structure of the data. Keyword search in relational databases has recently emerged as an active research topic. We use the running example as in [1], here in traditional system we need to know the schema structure and give the query in the SQL language which is SELECT * FROM COMPLAINT C WHERE CONTAINS (C.comments,maxtor,1)>0 ORDER BY score(1) DESC This prevented many users from using the database and also from retrieving the information from it. Also the schema of the database must be known in advance in order to retrieve the data from the database. It became a tedious process for the novel users to derive the data present in the
database and also searching from the database
became
time
consuming
process
Fig1:The matches to the query Maxtor Netvista are underlined
But by the integration of the database with the information retrieval we need not require the SQL language to retrieve the information. We can obtain the results by searching from the various relations that is by joining the relations of a database. The relations will be joined by the foreign key. A feature of keyword search over RDBMSs is that search results are obtained by combining relevant tuples from several relations so that they will collectively be relevant to the query. This provides several advantages as (1) Due to normalization, the data might get split and stored in different relations. Those data can be easily retrieved as searching is not limited to a single relation alone.(2) It can help in finding out new or unexpected relationship among the attributes.(3) The user need not have any knowledge about the schema structure and the SQL query language. The rest of the paper is organized as follows: In section 2, we describe about the techniques that were used in the searching and ranking in the existing systems. In section 3, we describe about our proposed system. We discuss about the various scoring functions we used to rank in section 4. The ranking function and the architecture of our system is discussed in section 5 followed by performance evaluation factors discussed in section 6. Section 7 describes the related works and section 8 concludes our paper.
II. EXISTING SYSTEM

The information retrieval (IR) style keyword search on the web has been a success, keyword search in relational databases have recently emerged. Existing IR strategies are inadequate in ranking relational outputs, so a novel IR ranking strategy for effective keyword search by using a real world database and a set of keyword queries collected by a major search company which is better than existing strategies. The various score calculation techniques mainly used are: Ranked retrieval, Scoring documents, Term frequency,Collection statistics, Weighting schemes and Vector space scoring. The processing of keyword query is done by generating all candidate answers, each of which is a tuple tree by joining tuples from multiple tables. Then compute a single score for each answer. And finally return answers with semantics. In [4] a new concept called Compact Steiner Tree (CSTree), which can be used to approximate the Steiner tree problem for answering top-k keyword queries efficiently and also propose a structure-aware index, together with an effective ranking mechanism for fast, progressive and accurate retrieval of top-k highest ranked CSTrees. BANKS models tuples as nodes in a graph, connected by links induced by foreign key and other relationships. Answers to a query are modeled as rooted trees connecting tuples that match
individual keywords in the query. Answers are ranked using a notion of proximity coupled with a notion of prestige of nodes based on inlinks, similar to techniques developed for Web search. A survey of the developments on finding structural information among tuples in an RDB using an l-keyword is given by representing RDB as a data graph GD(V,E). It is called a keyword based approach and is discussed in [4]. The existing algorithms use a parameter to control the maximum size of a structure allowed. The problem of multiple-query optimization has been discussed. Here a systematic look at the problem, along with the presentation and analysis of algorithms that can be used for multiplequery optimization and presentation of experimental results is provided . Our results show that using multiple-query .These existing approaches are discussed in [1],[2][3],[4]. The candidate networks were used in many of the keyword based search systems and they use the breadth first search to search in the candidate network. There are three rules followed in the candidate network generation such as Rule 1 Prune duplicate CNs. Rule 2 Prune non-minimal CNs, i.e., CNs containing atleast one leave node which does not contain a query keyword. Rule 3 Prune CNs of type: . The rationale is that any tuple s ( may be a free or non-free tuple set) which has a foreign key pointing to a tuple in must point to the same tuple in .
The query processing in this paper is done by using the Tree pipeline algorithm which is an enhanced version of the block pipeline algorithm and is better in pruning the empty set. The candidate network will be generated for the generation of the result based on the keyword given by the user. We can specify whether a keyword must be present in the search query or whether it is optional. If the keyword must be searched, then it will use the AND semantics and otherwise it will use the OR semantics.This provides the flexibility to the user. The parameter p can be used to denote it which ranges from values 0 to 1.AND semantics has a value of 1 and the OR semantics takes a value of 0. A keyword can also use any of the values from 0 to1.
IV. SCORING FUNCTION

The score is calculated for the purpose of ranking and there are various different techniques that can be used for calculating the scores. Usually the score was calculated based on three factors namely (i) query term weight (ii) term frequency (iii) inverse document frequency. The scoring function are done by the tf-idf, cosine similarity , vector space model and Okapi BM25. The user will prefer the document which will match with completely all the keywords in the query when compared to the documents which match only partially. The IR style of ranking will use the IR ranking score as
III. PROPOSED SYSTEM

In this paper we focus on the keyword query ranking based on the relevancy score and based on the user preference the top-k results will be retrieved from the relational database and returned to the user. This system will provide an efficient query processing and the ranking functions which is based on the scores calculated by different methods. The score calculation will be dealt in the Section 4. The existing systems have used only one scoring function and when many scoring functions are used, it will lead to more accurate results and the result of the query will be more accurate. In SPARK[2] the completeness factor is used for this purpose which is defined as
The size normalization factor is also an important factor and defined as The final score is computed by the product of all the three scores and is given by
Based on this final score only the ranking will be done in order to obtain the top-k relevant results that match the requirements of the user.
V. RANKING FUNCTION
Ranking is a major problem in many applications where information retrieval is performed. When documents are retrieved from RDBMS based on a user query then ranking plays an important part as to deliver the documents in a sorted order based on different parameters. Initially the ranking was based on the size of the CN[5]. Later DISCOVER2[6] was developed in which the ranking was based on the scores generated by the state-of-art IR scoring function. The document that are retrieved are ranked purely based on the relevance of the document to the given query. In BANKS[7], the ranking is based on the hits generated by spanning multiple databases where a graph system is used. These relevance of the documents are obtained by the a score that is calculated by the relevance of one document to the query and also of the pairwise preferences in which two documents are compared and the document more relevant among the two is placed on top of the returned result. The selection of the suitable scoring function is a very important process as the ranking result varies with respect to the different scoring function used. In our paper the scores obtained from different scoring functions are aggregated together to obtain a single value that is used for ranking the final results. The fig.2 gives you the proposed architecture of our system The user will give the keyword search query in the user interface. The query will be sent to the query evaluation engine where the query will be processed by the generation of the candidate network and the then it is sent to the searcher which will search for the keyword by performing the match. The searched results will be sent to the ranker which will rank the results based on the calculated score and the weight of the terms given by the user. Finally the ranked result will be sent to the user interface for the user to view the result and also the results that are obtained are stored in a summary database. When we query the database with the search query that was already
used the system initially checks if similar searches has been performed. And if such search has been performed then the summary database is checked to retrieve the result and integrate it with the updated values in the database. The use of summary database reduces the number of database probing. The computational sharing is done by which the intermediate results are cached and thereby preventing the search to occur from the scratch. Many different ranking schemes that have the characters of data nodes as node weight, term frequency or inter tuple characters as weight on edges have been used in the existing system. Several advanced features as the phrase based ranking also have been proposed [8], [9], [10], [11], [12], [13], [14], [15], [16], [17].
user
User interface
Query
Searcher
Ranker
Proce ssor
Relational database
Searcher
Ranker Ranker
Summary db
Fig.2 Proposed Architecture
VI. EVALUATING EFFICIENCY

The effectiveness of the returned can be measured by Usually the precision which is relevant results returned by the results returned is given by results that are various factors. the number of total number of
Precision measures among the retrieved documents, how many are relevant. It does not care
if we do not retrieve all the relevant documents but penalizes if we retrieve non-relevant documents. The recall is also another factor to measure the effectiveness which tells how good the system is at finding the relevant documents.[3] It is defined as the ratio of relevant results returned by all relevant results and is given by
approach [12],[13] were developed for the purpose of efficiency. The other approach includes the relational databases in which the structured data are stored. The earlier developed approach is the DBxplorer[18] where query processing is mainly focused. The various techniques of query processing are applied instantly in [5],[6],[7]. Further to the common explanation of keyword search studies have been conducted on identifying the entities which are relevant to the query in an implicit manner. The Rank aggregation method of query processing has also been studied lately which considers challenge of retrieving the k results that acquires the highest score.
Recall measures how much of the relevant set of documents we can retrieve. It does not care if we retrieve non-relevant documents also in the process. The precision and the recall are both dependent on each other. We cannot increase both at the same time because on trying to increase the precision, the recall will get reduced and vice versa. The discrimination is another factor for calculating the performance of the system. It tells how good is the system at rejecting the documents that are irrelevant. It is given by
VIII. CONCLUSION AND FUTURE WORK

In this paper, we have performed the keyword based search and ranked the retrieved results based on the degree of relevance to the user. The ranking was done using the score. The performance of the system using different metrics was also evaluated. We have performed the searching and ranking of the certain and structured data of the relational databases. In future we would like to extend our searching and ranking to uncertain and probabilistic databases.
By these various metrics we can find the effectiveness and how well our system is able to retrieve the relevant results and eliminate the irrelevant results.
REFERENCE
[1].http://www.stanford.edu/~maureenh/quals/html/ ml/node130.html [2]. Yi Luo ,Xuemin Lin, Wei Wang and Xiaofang Zhou SPARK: Top-k Keyword Query in Relational Databases, SIGMOD07 [3]. S. Agrawal, S. Chaudhuri, and G. Das, DBXplorer: A System for Keyword-Based Search over Relational Databases, Proc. 18th Intl Conf. Data Eng. (ICDE 02), pp. 5-16, 2002. [4]. Jeffrey Xu Yu, Lu Qin and Lijun Chang Keyword Search in Relational Databases: A Survey,IEEE 2010.
VII. RELATED WORKS

The main aim of keyword search is to find a top-k interconnected tuples that have high relevance to the query given by the user. There are various approaches such as modeling of data set as a graph and the results are given in the form of sub graphs is one of the approaches. Most works use the heuristic approach to achieve the required efficiency whereas the steiner tree setback was solved by using an exhaustive search. Various other works include the Q-subtree[11], BANKS[7] and the indexing
[5]. V. Hristidis and Y. Papakonstantinou, DISCOVER: Keyword Search in Relational Databases, Proc. Intl Conf. Very Large DataBases (VLDB), pp. 670-681, 2002. [6]. V. Hristidis, L. Gravano, and Y. Papakonstantinou, Efficient IRStyle Keyword Search over Relational Databases, Proc. 29th Intl Conf. Very Large Data Bases (VLDB), 2003. [7] G. Bhalotia, A. Hulgeri, C. Nakhe, S. Chakrabarti, and S.Sudarshan, Keyword Searching and Browsing in Databases Using BANKS,Proc. 18th Intl Conf. Data Eng. (ICDE 02), pp. 431-440, 2002. [8]. F. Liu, C.T. Yu, W. Meng, and A. Chowdhury, Effective Keyword Search in Relational Databases, Proc. ACM SIGMOD Intl Conf. Management of Data, pp. 563-574, 2006. [9] S. Wang, Z. Peng, J. Zhang, L. Qin, S. Wang, J.X. Yu, and B.Ding, Nuits: A Novel User Interface for Efficient Keyword Search over Databases, Proc. 32nd Intl Conf. Very Large Data Bases (VLDB), pp. 1143-1146, 2006. [10] B. Ding, J.X. Yu, S. Wang, L. Qin, X. Zhang, and X. Lin, Finding Top-k Min-Cost Connected Trees in Databases, Proc. IEEE 23rd Intl Conf. Data Eng. (ICDE), 2007. [11] K. Golenberg, B. Kimelfeld, and Y. Sagiv, Keyword Proximity Search in Complex Data Graphs, Proc. 28th ACM SIGMOD Intl Conf. Management of Data, 2008.
[12] B.B. Dalvi, M. Kshirsagar, and S. Sudarshan, Keyword Search on External Memory Data Graphs, Proc. VLDB Endowment, vol. 1,no. 1, pp. 1189-1204, 2008. [13] H. He, H. Wang, J. Yang, and P.S. Yu, Blinks: Ranked Keyword Searches on Graphs, Proc. ACM SIGMOD Intl Conf. Management of Data, pp. 305-316, 2007. [14] M. Sayyadan, H. LeKhac, A. Doan, and L. Gravano, Efficient Keyword Search Across Heterogeneous Relational Databases, Proc. 23rd IEEE Intl Conf. Data Eng. (ICDE), 2007. [15] S. Tata and G.M. Lohman, SQAK: Doing More with Keywords, Proc. ACM SIGMOD Intl Conf. Management of Data, pp. 889-902, 2008. [16] Q.H. Vu, B.C. Ooi, D. Papadias, and A.K.H. Tung, A Graph Method for Keyword-Based Selection of the Top-k Databases,Proc. ACM SIGMOD Intl Conf. Management of Data, 2008. [17] B. Yu, G. Li, K.R. Sollins, and A.K.H. Tung, Effective Keyword- Based Selection of Relational Databases, Proc. ACM SIGMOD Intl Conf. Management of Data, pp. 139-150, 2007. [18]. S. Agrawal, S. Chaudhuri, and G. Das, DBXplorer: A System for Keyword-Based Search over Relational Databases, Proc. 18th Intl Conf. Data Eng. (ICDE 02), pp. 5-16, 2002.
THE USE OF RFID FOR HUMAN IDENTIFICATION (WIRELESS COMMUNICATION)

M. Hakkeem 1 T.Ayesha Rumana2
1 2
R. Ganesh 3
Assistant Professor, Department of CSE, United Institute of Technology, Coimbatore-20. Assistant Professor, Department of ECE, RVS College of Engg &Technology,Dindigul-5 3 PG Student, Department of CSE, United Institute of Technology, Coimbatore-20
ABSTRACT
Automatic identification technologies like RFID have valuable uses, especially in connection with tracking things for purposes such as inventory management. RFID is particularly useful where it can be embedded within an object, such as a shipping container. There appear to be specific, narrowly defined situations in which RFID is appropriate for human identification. Miners or firefighters might be appropriately identified using RFID because speed of identification is at a premium in dangerous situations and the need to verify the connection between a card and bearer is low. But for other applications related to human beings, RFID appears to offer little benefit when compared to the consequences it brings for privacy and data integrity. Instead, it increases risks to personal privacy and security, with no commensurate benefit for performance or national security. Most difficult and troubling is the situation in which RFID is ostensibly used for tracking objects (medicine containers, for example), but can be in fact used for monitoring human behavior. These types of uses are still being explored and remain difficult to predict. For these reasons, we recommend that RFID be disfavored for identifying and tracking human beings. When DHS does choose to use RFID to identify and track individuals, we recommend the implementation of the specific security and privacy safeguards described herein.
I. INTRODUCTION
The purposes of this paper are to: (1) address the use of Radio Frequency Identification technology (RFID) by the Department of Homeland Security (DHS) to identify and track individuals; (2) outline the potential data privacy and integrity issues implicated by this use of RFID technology; (3) offer guidance to the Secretary of DHS, program managers, and the DHS Privacy Office on deciding whether to deploy RFID technology to track individuals; and (4) offer steps to consider in order to mitigate privacy and data integrity risks when planning to use RFID to identify and track individuals.
RFID is a leading automatic identification technology. RFID tags communicate information by radio wave through antennae on small computer chips attached to objects so that such objects may be identified, located, and tracked. The fundamental architecture of RFID technology involves a tag, a reader (or scanning device), and a database. A reader scans the tag (or multiple tags simultaneously) and transmits the information on the tag(s) to a database, which stores the information. Transmitting identification data by radio rather than by manual transcription increases the quality, speed, and ease of that information transfer, which is the basis for the technologys appeal. RFID tags can be installed on objects such as products, cases, and pallets. They can also be
II. BACKGROUND
embedded in identification documents and even human tissue. Both the private and public sectors are increasingly using RFID to track materiel (such as for inventory management), but RFID is also being considered and adopted by DHS and other government agencies for use in tracking people. While RFID can demonstrably add value to manufacturing, shipping, and object-related tracking, there is an impulse at this time to deploy it for purposes to which it is not well suited. RFIDs comparative low cost, invisibility, and ease of deployment in automated tracking often make it appear more attractive than the alternatives. RFID may also address some logistical or efficiency problems in human identification and tracking, but some current and contemplated uses of RFID for tracking people may be misguided. Attempts to improve speed and efficiency through using RFID to track individuals raise important privacy and information security issues. This paper is not a tutorial on RFID technology itself. Nor does it address the problem of developing international standards to support widespread deployment of RFID technology efficiently. Rather, this paper addresses only the privacy and data integrity issues raised by the use of RFID when explicitly designed and used for tracking people. It does not discuss the use of RFID on general objects, such as clothing or food items purchased from a store that might used to track people without their knowledge or consent. This latter practice raises far greater privacy concerns than explicit tracking and it should be rejected in all cases except when the security mission calls for tracking individuals about whom suspicion has met an appropriate legal threshold.
people. The major laws, executive orders, and programs under which RFID is being considered or used are either permissive as to technology or not legally binding on the U.S. government. In this analysis of RFID as a generic technology, we cannot address all the rights, statutes, and regulations that may limit the use of RFID for human tracking, limit the use of information collected via RFID, or grant individuals rights pertaining to data collected via RFID. When RFID is used for human tracking, the data collected will undoubtedly comprise a system of records under the Privacy Act of 1974. People should have at least the rights accorded them by that law when they are identified using RFID. Systems using RFID technology are, of course, also subject to the EGovernment Acts Privacy Impact Assessment requirements.
IV. RFID FOR HUMAN IDENTIFICATION: CLARIFYING INCORRECT ASSUMPTIONS

A number of DHS programs are premised on the identification of human subjects. At the border in the US-VISIT program, at airports in the CAPPS I program, and at entrances to secure facilities of all kinds, checking identification cards is a routinely used security measure. Behind many of the current ideas for using RFID in human identification is a commonly held misperception that RFID improves the speed of identification. RFID is a rapid way to read data, but RFID does not identify individuals. If RFID is tied to a biometric authentication factor, it can reliably identify human beings; but tying RFID to a biometric authentication negates the speed benefit. A. Controlling Access, Controlling Borders, and Interdicting Suspects Checking identification is intended to achieve a number of different goals: Facilities managers use identification to control access to
III. THE LEGAL BASIS FOR RFID USE IN HUMAN IDENTIFICATION

We know of no statutory requirement that DHS use RFID technology, specifically, to track
sensitive infrastructures that may be damaged or used to harm Americans. They use it to control access to facilities where sensitive information about other infrastructure may be kept, or where security planning or operations are carried out. The government uses identification administratively to track the border crossings of international travelers. At borders and checkpoints, identification can help detect and interdict undesirable entrants to the country and known or suspected terrorists. These identification processes are intended to protect a wide variety of institutions, infrastructures, processes, and persons from a wide variety of threats, each having a different risk profile. At base, checking identification seeks to interdict potential attackers on our institutions, infrastructure, and people. We make no effort here to determine how well the practice of identifying people achieves this mission, how well identification systems are secured against corruption and fraud, or whether the protection provided by identification-based security outweighs its costs to privacy and other interests. We only address here the difference between those identification processes using RFID and those not using RFID. We are aware of two reasons to use RFID in identification processes: to increase the speed and efficiency of identification processes and to hinder forgery and tampering with identification documents. An RFID-chipped identification card can quickly communicate information from the card to a reader from a distance, without a line of sight or physical contact between a card and reader. With the proper use of encryption, information on an RFID chip can be rendered very difficult, if not impossible, to forge or alter.
must review the information on the card and authorize the bearer to pass, record the bearers passing, or, if appropriate, detain the bearer. The verifier must also compare the identifiers on the card with the bearer to ensure that the bearer is the person identified by the card. The use of RFID could dispense with one of these steps by eliminating the hand-over of the card. The other two steps are not affected by RFID. The verifier must still review authorizing information and compare the identifiers on the card with the bearer. These are distinct processes. The identification information communicated by an RFID chipped identification card can be used to determine the bearers authorization, but it is not authorization itself. (An RFID-chipped card, just like any card, could have a separate data element indicating authorization, of course, provided it was secure against forgery and tampering.) In order for any document or device to accurately identify someone, it must be linked to the person in some way. This is almost always through some form of biometric a picture, description, fingerprints, or iris scan, for example. A document that is not linked to a person using a biometric is not a reliable identification document, just as someone holding a key to a house cannot be identified as the owner of the house based upon possession of that key alone. The RFID-chipped I94 Form, for example, is not directly linked to individuals by a reliable biometric. The RFID chip in the form is useful for tracking the location of the form and correlating the form with a specific entry in a visitor database, but the form and the chip are easily transferred from one person to another. If the RFID-chipped I-94 Form were relied upon to indicate the location of a person without separate verification of identity, it would easily be used to defeat the regulation of border crossings. C. Use of RFID Creates Risks to Individuals While improving identification-based security by small margins, if any, the use of RFID for human identification may create a number of risks that are not found in conventional and non-
B. RFID Can Reduce Delay at Entrances and Checkpoints It takes some time to check a traditional identification document. The process typically includes handing the document to a verifier, who
radio identification processes. Individuals will likely be subject to greater surveillance in RFID identification. They will be less aware of being identified and what information is transferred during identification, concerns that necessitate transparency in the design of RFID identification systems. And, finally, the use of RFID creates security risks that are not found in non-radio identification systems.
V. EFFECTS OF RFID FOR HUMAN TRACKING ON PRIVACY AND RELATED INTERESTS

Identification-based security programs create many concerns relating to privacy and related interests. We confine our analysis here to the incremental concerns created by the use of radio to communicate identity information from a card or token to a reader.
Ensure that only authorized readers can read the tags, and that only authorized personnel have access to the readers; Maintain the integrity of the data on the chip and stored in the database; Ensure that the critical data is fully available when necessary; Mitigate the risk of various attacks, such as counterfeiting or cloning (when an attacker produces an unauthorized copy of a legitimate tag); replay (when a valid transmission is repeated, either by the originator or an unauthorized person who intercepts it and retransmits it); and eavesdropping; Avoid electronic collisions when multiple tags and/or readers are present; and Mitigate the likelihood that unauthorized components may interfere or imitate legitimate system components.
VI. RECOMMENDATION: SHOULD BE DISFAVORED HUMAN TRACKING.
RFID FOR
VII. CONCLUSION
The case for using RFID to track materiel has been made fairly well. The Department of Defense, for example, has produced a significant study showing the benefits of using RFID to tame the substantial logistical challenges it faces. We are not aware of a similarly strong case for using RFID to track humans. RFID can reduce the delay when people pass through chokepoints that require identification. However, transmission of information from cards to verifiers is not a significant cause of the delay in such transactions compared to the authorization and verification steps. The Government Accounting Office (GAO - addressed the use of RFID technology in May 2005 report titled Information Security) states that RFID systems should be designed to:
RFID technology may have a small benefit in terms of speeding identification processes, but it is no more resistant to forgery or tampering than any other digital technology. The use of RFID would predispose identification systems to surveillance uses. Use of RFID in identification would tend to deprive individuals of the ability to control when they are identified and what information identification processes transfer. Finally, RFID exposes identification processes to security weaknesses that non-radio-frequencybased processes do not share. The Department of Homeland Security should consider carefully whether to use RFID to identify and track individuals, given the variety of technologies that may serve the same goals with less risk to privacy and related interests. Should DHS go forward with RFID to identify and track individuals, a number of practices and recommendations exist to guide program managers. More analysis would be needed of specific RFID-based identification programs, particularly as to collection, maintenance, and use of information collected via RFID.
REFERENCES
INFORMATION SECURITY: RADIO FREQUENCY IDENTIFICATION TECHNOLOGY IN THE FEDERAL GOVERNMENT, GAO-05-551 (May 2005), available at http://www.gao.gov/new.items/d05551.pdf RADIO FREQUENCY IDENTIFICATION: OPPORTUNITIES AND CHALLENGES IN IMPLEMENTATION, DEPARTMENT OF COMMERCE (April 2005), available at http://www.technology.gov/reports/2005/RFID_A pril.doc FINAL REGULATORY FLEXIBILITY ANALYSIS OF PASSIVE RADIO FREQUENCY IDENTIFICATION (RFID), prepared by the Office of the Under Secretary of Defense for Acquisition Technology & Logistics, available at http://www.acq.osd.mil/log/rfid/EA_08_02_05_U nHighlighted_Changes.pdf RADIO FREQUENCY IDENTIFICATION: APPLICATIONS AND IMPLICATIONS
FOR CONSUMERS, A WORKSHOP REPORT FROM THE STAFF OF THE FEDERAL TRADE COMMISSION (March 2005), available at http://www.ftc.gov/os/2005/03/050308rfidrpt.pdf RFID: APPLICATIONS, SECURITY, AND PRIVACY (Simson Garfinkel and Beth Rosenberg, Editors) (2006); ARTICLE 29 WORKING PARTY WORKING DOCUMENT ON DATA PROTECTION ISSUES RELATED TO RFID TECHNOLOGY, 10107/05/EN, WP105 (January 19, 2005), available at http://europa.eu.int/comm/justice_home/fsj/privac y/docs/wpdocs/2005/wp105_en.pdf. CDT Working Group Set of Best Practices for the commercial use of RFID, May 1, 2006, available at http://www.cdt.org/privacy/20060501rfid-bestpractices.php.
IMAGE BASED STEGANOGRAPHY USING NEW SYMMETRIC KEY ALGORITHM AND FILE HYBRIDIZATION (NSAFH)
P.Savaridassan1 P.Angaiyarkanni 2, R.Kavitha 3, R.Vallarmadi 4, R.Ilakkiya5 Asst Professor ,Sri Manakula Vinayagar Engineering College, Madagadipet 2,3,4,5 Final year, B.Tech(IT Dept), Sri Manakula Vinayagar Engineering College, Madagadipet
1
Abstract
With the advancement of information technology, managing the confidentiality of the information has become a challenging issue. Cryptography and Steganography does this work of transferring the imperative statistics in a sheltered manner. We can use Cryptography and Steganography as an integrated part to augment security, in which Cryptography uses AES algorithm for encryption and an encrypted statistics is transferred through an image. In this paper, we propose a new technique of integration where Cryptography uses new symmetric key algorithm and a part of encrypted statistics is concealed in the DCT of image; remaining encrypted statistics are used as keys and then hybridized into another image file (i.e., file hybridization of Steganography). Hence, our proposed technique will provide multilevel security and the partial cipher concealing in image thereby making the system more secure. Keywords: Container File, New Symmetric Key algorithm, Secret Key, DCT Coefficient, Supporting File 1. Introduction The rise of the internet has been one of the most important factors of information technology and communication. The Internet provides essential communication between tens of millions of people and is being increasingly used as a tool for commerce; security becomes a tremendously important issue to deal with. There are many aspects to security and many applications, ranging from secure commerce and payments to private communications and protecting passwords. Information Security is to protect the information and its elements including the systems and hardware that use, store and transmit that information. Cryptography is the science of using mathematics to encrypt and decrypt data. Steganography is the art and science of communicating in a way which hides the existence of the communication. Cryptography scrambles a message so it cannot be understood. Steganography hides the message so it cannot be seen. We can develop one system, which uses both cryptography and Steganography for better confidentiality and security [1]. In cryptography, we use new symmetric key algorithm which provides high level of security with short duration and smooth encryption of data. In Steganography, a transform domain technique called DCT, which is used to conceal messages in considerable areas of the cover image. The concept of hybridization may be used in the field of Steganography, where more than one file is to be merged and a new hybrid file may consequently be generated. This hybrid file basically consists of two files, namely Container file Supporting file Container File: As the name suggests this is a file where we can store the secret data. The basic property of this container file is such that even if we change the intensity of any pixel it should look like the original image. Supporting File: To make the image common, we need a supporting image file so that the new hybrid file looks like the original one. The selection of supporting file will depend on the feature of the container file to ensure the above characteristics. There will be two options in
Organized by: Department of Computer Science and Engineering, Anand Institute of Higher Technology, Chennai. www.iconic12.in E-mail: iconic.aiht@gmail.com [Type text] Page 285
this process; either we can put the container file into the supporting file or vice versa. Initially the text which has to be transmitted is encrypted using new symmetric key algorithm and the encrypted text is embedded into many container files and then into the supporting files using Steganographic technique (DCT). 2. Related works: Designing an Embedded Algorithm for Data Hiding using Steganography Technique by File Hybridization Proposed a new steganographic technique based on the file hybridization where more than one image is used for embedding a data which give more effectiveness of the picture and also provide high level security of data [1]. Proposed System for Data Hiding Using Cryptography and Steganography Developed a system in which cryptography, steganography and security module are used as integrated part. Here, the text is encrypted using AES algorithm, the part of encrypted text is concealed in DCT of host image; remaining part used for generating keys to reconstruct the concealed message at receiver side [2]. A New Horizon in Data Security Cryptography & Steganography by
technique and encrypted using new symmetric key cryptographic algorithm and then hidden into an image and thereby improved data hiding capacity [10]. A Novel Approach for Enciphering Data of Smaller bytes Presented an enhanced cost effective new algorithm (i.e., new symmetric key algorithm) to take less time for executing the small amount of data and also this encryption produce high level security for small amount of data[3] New Symmetric key Cryptographic algorithm using combined bit manipulation and MSA encryption algorithm: NJJSAA symmetric key algorithm Presented a new advanced symmetric key cryptographic method called NJJSAA which uses bit manipulation method for encrypting and decrypting message .This method was suitable for encrypting large and small files [11]. 3. NSAFH System The main goal of our proposed system is to provide high embedding capacity and multilevel of security by combining cryptography and steganography technique, and also preventing data from any changes that made to the Stego image, which doesnt destroy the hidden information by maintaining more number of cover images. This paper proposes new symmetric key algorithm for security and hybridization technique for robustness. The new symmetric key algorithm provide high security than AES algorithm, since there are two reverse operations present in this algorithm which would make it more secure and also CRC checking in receiving ends is effortless. Further by hybridizing the file, the pixels of the supporting image file will be replaced by the corresponding pixels of the container file. After replacing all pixels value of supporting image by the container image we get the resultant hybrid image or mixed image.
Introduced a new module as compressed module for compression and here before encryption, first the text was compressed using Huffman compression and then encrypted using AES and then a part of encrypted message is embed in DCT of an image, remaining encrypted text is used as key and thereby increasing the payload capacity[5]. Steganography in Tie with Compression and Cryptography Introduced a new technique which has integrated module in which it combines cryptography and steganography along with security module. At first, the input message is compressed using random access compression
So this approach can withstand data even some changes made to Stego image. 4. Proposed Architecture
Step 2: Generate the corresponding binary value of it. [Binary value should be 8 digits (no matter how much the length of it, we should represent it in 8 digits (28=256). e.g. for decimal 32 binary number should be 00100000 (underlined zeros are required)] Step 3: Reverse the 8 digits binary number Step 4: Take a 4 digits divisor (>=1000) as the Key Step 5: Divide the reversed number with the divisor Step 6: Store the remainder in first 3 digits & quotient in next 5 digits (remainder and quotient wouldnt be more than 3 digits and 5 digits long respectively. If any of these are less than 3 and 5 digits respectively we need to add required number of 0s (zeros) in the left hand side. So, this would be the ciphertext i.e. encrypted text. Now store the remainder in first 3 digits & quotient in next 5 digits. [Since it will works character by character that is why spaces, commas, each & every character will be treated as one single character & we have to apply the above algorithm for every character.] Concealing Process: In this concealing process, a part of an encrypted message is inserted into the DCT domain of the image. The hidden message is a stream of 1 and 0 giving a total number of 56 bits. The transform is applied to the image as a multiple factor 8*8 blocks. DCT can separate the Image into High, Mid and Low Frequency Components. The next step of the technique after the DCT is to select the 56 larger positive coefficients, in the LOW and MID frequency range. The selected coefficients ci are ordered by magnitude and then modified by the corresponding bit in the message stream. Using the following equation, we can compute DCT coefficient matrix values:
Figure 1: NSAFH System Architecture In this architecture, sender sends the plain text which is encrypted into cipher text using new symmetric key algorithm. Then the cipher text is embedded into an image using DCT then this image is hidden into another image. At the receiver side first we have to separate the hidden images. Then from the hidden image the cipher text is extracted using inverse DCT .This cipher text is decrypted into plain text using reverse process of new symmetric key algorithm. 5. Working model: New symmetric key algorithm is initially used for encryption. Encryption Algorithm Step 1: Generate the ASCII value of the letter
D(i,j)=
image file. The steps for creating hybrid file are as follows: Step1: First the supporting image file is selected .. (1) Step 2: Then, choose the container image file Step3: Before hybridizing above two files, first check: If the size of the supporting image as, M1 x N1, then place the container file inside the supporting file in a region A(x, y) and B(x + r, y + s) for a suitable value of r and s, where 0 r M1 and 0 s N1. Step4: Replace the pixel of supporting image file by corresponding pixel of container image file Step5: Resulted Hybrid image file Separation process: At the receiver side, first the container image file is separated from the hybrid image file using the reverse process of hybrid file creation. Revealing process: After separating the container image file, the concealed message is revealed by using the steps given below: Step1: The text embedded image is again splits into 8 x 8 blocks Step2: The DCT is performed for each block Step3: Then the original and text embedded DCT coefficient values are compared for all the blocks Step 4: The message (cipher text) is revealed Decryption Algorithm Step 1: Multiply last 5 digits of the ciphertext by the Key. Step 2: Add first 3 digits of the ciphertext with the result produced in the previous step.
And C (u) = To get the matrix form for the above equation, use the following: T= . (2) Now the quantity factor D will be: D=TMT Where, T refers the DCT matrix value , T refers Transpose DCT matrix and M refers the matrix form of 8 x 8 blocks of original image values Here , the original 8x8 block of an image value is leveled off and produces the value in Matrix M and then the row transformation is done by the product MT and the column transformation is done by the product MT. The block matrix D consists of 64 DCT coefficients values. If the ith message bit S(i) to be embedded is 1,a quantity D is added to the coefficient. This D quantity represents the persistence factor. If the message bit is 0, the same quantity subtracted from the coefficient. Thus the replaced DCT coefficients are: For S(i)=1 DCT (new) = DCT+1*D For S(i)=0 DCT (new) =DCT-1*D Hybrid process: The next step after hiding is hybrid phase. In this phase, the text embedded image (i.e., container image file) is hybridized into another image file which is called supporting .
Step 3: If the result produced in the previous step i.e. step 2 is not an 8-bit number we need to make it an 8-bit number. Step 4: Reverse the number to get the original text i.e. the plain text.
small and hence key management in receiver side will not be more complex and is more secure, since we are using reverse process. In steganography, any modifications to image such as compression, filtering, cropping will not affect the concealed text, because file hybridization technique is used. 8. References: [1] G. Sahoo and R. K. Tiwari Designing an Embedded Algorithm for Data Hiding using Steganographic Technique by File Hybridization, IJCSNS International Journal of Computer Science and Network Security, 2008. [2] Dipti Kapoor Sarmah, NehaBajpai Proposed System for Data Hiding Using Cryptography and Steganography, International Journal of Computer Applications, 2010. [3] R. Satheesh Kumar, E. Pradeep, K. Naveen and R. Gunasekaran A Novel Approach for Enciphering Data of Smaller Bytes, International Journal of Computer Theory and Engineering, 2010. [4] Ayushi A Symmetric Key Cryptographic Algorithm International Journal of Computer Applications, 2010. [5] Dipti Kapoor Sarmah, Neha Bajpai A new horizon in data security by Cryptography & Steganography, International Journal of Computer Science and Information Technologies (IJCSIT), 2010. [6] Arvind Kumar and Km. Pooja Steganography a Data Hiding Technique, International Journal of Computer Applications, 2010. [7] Kallam Ravindra Babu, Dr. S.Udaya Kumar, Dr. A.Vinaya Babu A Survey on Cryptography and Steganography Methods for Information Security, International Journal of Computer Applications, 2010. [8] R. Amirtharajan, R. Akila, P. Deepika chowdavarapu A Comparative Analysis of
6. Result In this section, we compare both the capability of our proposal and analyze the performances of the running prototype we developed. Parameters Decryption time Key Size Robustness Payload Existing System More 128bit More Data Loss Medium NSAFH System Less 4bit Less Data Loss High
Table 1: Parameter Analysis The decryption time of our NSAFH system is less compared to that of the existing system. The key size of existing system is 128 bit whereas in ours, it is only 4 bit. If there is any modification done to the text embedded image, then there is not much data loss, which is termed as robustness. In our proposed architecture the robustness factor results with less data loss compare to existing system, i.e., even after the image manipulation, the data within the image wont gets affected. The amount of data that could be embedded into the image is high, which is the payload factor. In our proposed architecture, we uses more cover image files thereby providing greater area to embed data into it. 7. Conclusion Our proposed system aims to provide more security since we have used 2 reverse processes in cryptographic algorithm (new symmetric key algorithm). The key size is
Image steganography, International Journal of Computer Applications, 2010. [9] Deng Qian-lan, The Blind Detection of Information Hiding in Color Image, IEEE, 2010. [10] Chandra Jyotsna, VS Venugopal Steganography in Tie with Compression and Cryptography, International Journal of Communication Engineering ApplicationsIJCEA, 2011. [11] Neeraj Khanna, Joel James, Sayantan Chakraborty, Joyshree Nath, Amlan Chakrabarti, Asoke Nath New Symmetric key Cryptographic algorithm using combined bit manipulation and MSA encryption algorithm: NJJSAA symmetric key algorithm, International Conference on Communication Systems and Network Technologies, 2011. [12] R Lokeeshwara Reddy, Dr. A. Subramanyam Implementation of LSB Steganography and its Evaluation for Various File Formats, IJANA, 2011. [13] Madhumita Sengupta, J. K. Mandal Self Authentication of Color Images through Discrete Cosine Transformation (SADCT), ICRTIT, 2011.
Blacklist Based Anonymous User Blocking

Sruthi Franco 1 C Pradeesh Kumar2 1 Final Year ME CSE Student, KCG College of Technology, Chennai 2 Assistant Professor,KCG College of Technology, Chennai
Abstract
The framework to blacklist, track and block the anonymous user in IP network. The anonymous users are the users who are not valid or dishonest users. The IP network is network of computers using internet protocol for their communication. Anonymizing network is a type of IP network in which the identity of the user is hidden by using pseudonyms. The true identity of the user is not revealed i.e. the user remains anonymous. This anonymity is provided by using a series of routers to hide users IP address. Some users misbehave in this network and they remain anonymous and the web server is not able to identify the real misbehaved users leading to the banning the anonymizing network. The misbehaved user is traced and blacklisted and again if they misbehave they are blocked by the web server. Therefore to block the misbehaving user and to give honest user anonymity, trusted third party is introduced. These help in blocking the misbehaving user preversing their anonymity. In this model the misbehavior can be defined by the web server. Therefore the anonymity and privacy of the blacklisted users are maintained even if they are banned from using the server again Keywords:IPNetwork,AnonoymizingNetwork ,Anonymity,Blacklist,Privacy,Pseudonym Manager,Blacklist Manager anonymous communication is sender anonymity, receiver anonymity and unlinkability of receiver and sender. The unlinkability of sender and receiver means that the path between the sender and receiver cannot be tracked or linked. Therefore it is not possible to find real sender and the receiver. The anonymous communication means license to misbehave. In case of user misbehavior it is difficult to identify the real culprit, which is the main disadvantage of the anonymous communication. Due to the misbeviour the performance of the network will be degraded. Therefore the web server doesnt usually favour the anonymizing network. Therefore they tend to ban the user access through anonymizing network completely; even the honest users wont be able to access through these types of networks. It is necessary to block the misbehaving user and allowing only the honest users to access the anonymizing network. There are many methods to block the misbehaving users and also preserve their anonymity.
I. INTRODUCTION
The base technology of internet does not require the users to identify themselves. The service providers usually enforce the identification of users for the purpose of billing and managing the abuse. The users communicate or execute web transactions in anonymizing network like Tor [18][21]. The user accesses the network and send request to the web server. This request is sent to web server through a random number of intermediate nodes. The user passes the request to a random node of anonymizing network. The request reaches the web server through a series of random intermediate nodes .Therefore the web server is unable to identify the true initiator. This is both an advantage as well as a disadvantage. The disadvantage is that an honest user may be incorrectly suspected of originating the request and in some cases can even be banned from accessing the web server. The advantage is that the users identity will not be revealed and they can express their views openly. The basic property of
Pseudonym credential systems[10][14][28] allow the user to access the web pages using some pseudonyms and if misbehaved the users are blocked based on these pseudonyms. But in this all the users use pseudonyms, weakening the anonymity in the anonymizing network. Group signature[1][2][7] scheme allow the group member to sign anonymously on behalf of the group .This provides anonymity to the signer. Its main application is in case of voting and bidding. But in case of conflict or misbehavior, group manager opens the group signature and identity of the user is revealed. But the server must query the group manager for every authentication lacking scalabilty.This method does not provide anonymity to the misbehaved users. Traceable signatures [8][13][26] apply basic data mining technique where signature values of selective misbehaving users are traced. The user should be capable of claiming that he is the owner of the signature values. This leads to the self traceability property. But it does not provide backward unlinkability.Backward unlinkability means the users accesses to the network before the complaint should remain anonymous. Dynamic accumulators[11] are accumulators that allow one to dynamically add and delete input. The cost of adding and deleting is independent of the number of the accumulated values. It uses the RSA algorithm. It is an efficient membership revocation in the anonymous settings. Public parameters of the group must be checked and existing users credential must always be updated which is impractical.
XI.
implemented by introducing two trusted third parties namely Pseudonym Manager and Blacklist Manager. The system architecture consists of user, service provider and trusted third parties the blacklist manager and the pseudonym manager. If the user wants to execute web transaction, the user first registers with the Pseudonym Manager and issues user with pseudonym based on the IP address provided by it. The service provider registers with the Blacklist Manager which issues set of unique set of tokens.The user using its pseudonym name access the service provider through the anonymizing network. The service provider transfers the pseudonym to the Blacklist Manager. The Blacklist Manager consists of blacklist table. It has attributes like pseudonym, unique token, and blacklist value. Before the Service Provider give access, it checks with the blacklist table, if the pseudonym is present in thePSEUDONYM blacklist table, then the user is denied access to SP, else the user can access freely MANAGER through the network and the Service Provider. All the connections of the user before the control was made will be unlinked i.e the accessing information of user cant be traced back. But after the complaint is made, were all the connections of the user will be linked. This above property helps in maintaining anonymity of the user, eventhough it is blacklisted.
SYSTEM ARCHITECTURE
A system having properties like anonymous authentication, backward unlinkability, subjective blacklisting, fast authentication speeds, rate limited anonymous connections and revocation auditability is introduced. These properties can be Organized by: Department of Computer Science and Engineering, Anand Institute of Higher Technology, Chennai. www.iconic12.in E-mail: iconic.aiht@gmail.com [Type text] Page 292
BLACKLIST MANAGER
maintains a blacklist table and link /unlink option based on whether the user access. The Service Provider registers with the blacklist manager which maintains a blacklist table.
A. Blacklist Manager and Pseudonym
Anonymizing Network
Manager Blacklist manager controls the entire process of the whole architecture. Both the Service Provider and the Pseudonym Manager can be only accessed through the Blacklist Manager. It contains blacklist table which contains the attributes of pseudonym, tokens, blacklist status. Pseudonym manager controls the user activities. Users can access the anonymizing network only if they register with the Pseudonym Manager. It has knowledge about the routers in the anonymizing network. Pseudonyms are chosen based on controlled resources such that no two users can have the same pseudonym. The user connections are anonymous to Pseudonym Manager. It is created to reduce the load from the Blacklist Manager. It also acts as second server to the system.
B. Time
Tokens generated by the blacklist manager are bound to specific time periods called the linkability window. This linkability window is again divided small time intervals. USER SERVICE PROVIDER Users access within a time period is tied to single token generated by the Blacklist Manager. The use of different tokens across time periods grants the user anonymity Fig1 System Architecture showing interaction between time periods smaller time periods between the nodes. provide users with higher rates of anonymous III METHODOLOGY authentication, and likewise longer time periods rate-limit the number of misbehaviors from a particular user before he or she is The main entities of this framework blocked. are the Pseudonym Manager, Blacklist Manager, user and the Service Provider. The Pseudonym Manager does the registration of new user, authentication and verification. The blacklist manager issues the unique tokens, Organized by: Department of Computer Science and Engineering, Anand Institute of Higher Technology, Chennai. www.iconic12.in E-mail: iconic.aiht@gmail.com [Type text] Page 293
E. User Details and User Access Control
All the user details and the access history will be maintained at database of both the pseudonym manager and the service provider. If the user misbehaves all those history will be updated to both the database. IV SECURITY MODEL This system provides four security goals. These goals help to resist the collision attacks. Goals and Threats The user is a honest user if it obeys the system specification. A honest user becomes corrupt when it infer the knowledge that he is denied of and makes compromise with the attacker. The corrupt users reveal all information and deviates from the system specification. A. Blacklistability It ensures that any honest user can block the misbehaving users. If an honest user complains about a misbehaving user in current linkability window then that user wont be able to reconnect. B. Rate Limiting No user can successfully connect to it more than once within single time period. C .Anonymity A legitimate user is the one who has not been blacklisted by the service provider and had not exceeded rate limit of reestablishment of connections. Anonymity protects the anonymity of all the users. The service provider just knows that the user is legitimate or not. D. Non Frameability Honest users who are legitimate can connect to the service provider preventing the attacker from framing the legitimate user.
Fig2 Linkability Window The linkabikity window has two purposes i.e. allows dynamism since resources like IP address can get reassigned among different users so that is difficult to blacklist the resource for long time and it ensures forgiveness for some time after being misbehaved.
C. Permission Control and Blacklist
All the user details are maintained both at Pseudonym Manager and the Blacklist Manager. Authorized and unblocked users can access the anonymizing network. Blocked user cant access the network, but with the permission of the Blacklist Manager it can access. If the user misbehaves, service provider will link the future connections within current linkability window. Even though misbehaving user can be blocked from making any other connection, users past connection remain unlinkable.The service providers can subjectively judge users for any reason, since the privacy of users are maintained.
D. Notification of Blacklist Status
Users expect their connections to be anonymous while using the anonymizing network. In case of misbehavior of user, its future connection will be linked. Therefore the user should be able to view their blacklist status while trying to connect to the service provider. The user is able to download the blacklist table can check whether he is on the list. If present, user disconnects immediately.
V.ANALYSIS AND RESULT Fig 3 shows size of the entities used in this framework. The x-axis shows the number of entries i.e the complaints in the blacklist update request, tokens generated by the blacklist manager, seeds in the blacklist update response. Assume L be the number of time periods in the linkability window. And credential is the collection of tokens. The above figure fig 4 shows the amount of time the blacklist manager takes to perform. Suppose it takes about 9ms to create a credential when L=288.Therefore a protocol that occurs only once every linkability window for each user wanting to connect to the service provider. For blacklist updates, the initial jump in the graph corresponds to the fixed overhead associated with updating the blacklist. If there is no complaint then it takes blacklist to less than millisecond for updating. VI. CONCLUSION AND FUTURE WORK A framework is introduced in which helps in efficient and correct use of anonymizing network. Service provider can blacklist misbehaving users while maintaining their privacy and anonymity. This framework helps in practical, efficient and sensitive use of both user and service provider. This also increases the acceptance of anonymizing network which has been banned by different service providers due to misbehaving tendency of the users due to their anonymity. This is an application oriented software which simulates the blocking of misbehaving users in an anonymizing network. The resource used for generating pseudonym is the static IP address. This model can be implemented in larger network. This framework can be extended so that service provider can find repeated misbehaving users and block users for longer period of time. Finding repeated users means , there should be a provision to link between different linkability window. REFERENCES [1] G. Ateniese, J. Camenisch, M. Joye, and G. Tsudik, A Practical and Provably Secure Coalition-Resistant Group Signature, Scheme, Proc. Ann. Intl Cryptology Conf. (CRYPTO), Springer,pp. 255-270, 2000. [2] G. Ateniese, D.X. Song, and G. Tsudik, Quasi-Efficient Revocationin Group Signatures, Proc. Conf. Fig 4 Performance Vs Number of entries Organized by: Department of Computer Science and Engineering, Anand Institute of Higher Technology, Chennai. www.iconic12.in E-mail: iconic.aiht@gmail.com [Type text] Page 295
Fig 3 Size of entities Vs Number of entries If linkablity window of one day is 5 minutes, then time period L=288.Each entity grows as number of entities grows. Credential and blacklist update request grows with the same rate because credential is same as complaint list sent when the blacklist table is to be updated.
Financial Cryptography, Springer, pp. 183-197, 2002. [3] M. Bellare, R. Canetti, and H. Krawczyk, Keying Hash Functionsfor Message Authentication, Proc. Ann. Intl Cryptology Conf. (CRYPTO), Springer, pp. 1-15, 1996. [4] M. Bellare, A. Desai, E. Jokipii, and P. Rogaway, A Concrete Security Treatment of Symmetric Encryption, Proc. Ann. Symp. Foundations in Computer Science (FOCS), pp. 394403, 1997. [5] M. Bellare and P. Rogaway, Random Oracles Are Practical: A Paradigm for Designing Efficient Protocols, Proc. First ACM Conf. Computer and Comm. Security, pp. 62-73, 1993. [6] M. Bellare, H. Shi, and C. Zhang, Foundations of Group Signatures: The Case of Dynamic Groups, Proc. Cryptographers Track at RSA Conf. (CT-RSA), Springer, pp. 136-153, 2005. [7] D. Boneh and H. Shacham, Group Signatures with Verifier-Local Revocation, Proc. ACM Conf. Computer and Comm. Security,pp. 168-177, 2004. [8] S. Brands, Untraceable Off-Line Cash in Wallets with Observers(Extended Abstract), Proc. Ann. Intl Cryptology Conf. (CRYPTO),Springer, pp. 302318, 1993. [9] E. Bresson and J. Stern, Efficient Revocation in Group Signatures, Proc. Conf. Public Key Cryptography, Springer, pp. 190-206, 2001. [10] J. Camenisch and A. Lysyanskaya, An Efficient System for NonTransferable Anonymous Credentials with Optional Anonymity Revocation, Proc. Intl Conf. Theory and Application of Cryptographic Techniques (EUROCRYPT), Springer, pp. 93-118, 2001. [11] J. Camenisch and A. Lysyanskaya, Dynamic Accumulators and Application to Efficient Revocation of Anonymous
Credentials, Proc. Ann. Intl Cryptology Conf. (CRYPTO), Springer, pp. 61-76, 2002. [12] J. Camenisch and A. Lysyanskaya, Signature Schemes and Anonymous Credentials from Bilinear Maps, Proc. Ann. Intl Cryptology Conf. (CRYPTO), Springer, pp. 56-72, 2004. [13] D. Chaum, Blind Signatures for Untraceable Payments, Proc. Ann. Intl Cryptology Conf. (CRYPTO), pp. 199-203, 1982. [14] D. Chaum, Showing Credentials without Identification Transfeering Signatures between Unconditionally Unlinkable Pseudonyms, Proc. Intl Conf. Cryptology (AUSCRYPT), Springer, pp. 246-264, 1990. [15] D. Chaum and E. van Heyst, Group Signatures, Proc. Intl Conf. Theory and Application of Cryptographic Techniques (EUROCRYPT), pp. 257-265, 1991. [16] C. Cornelius, A. Kapadia, P.P. Tsang, and S.W. Smith, Nymble: Blocking Misbehaving Users in Anonymizing Networks, Technical Report TR2008-637, Dartmouth College, Computer Science, Dec. 2008. [17] I. Damgard, Payment Systems and Credential Mechanisms with Provable Security Against Abuse by Individuals, Proc. Ann. Intl Cryptology Conf. (CRYPTO), Springer, pp. 328-335, 1988. [18] R. Dingledine, N. Mathewson, and P. Syverson, Tor: The Second- Generation Onion Router, Proc. Usenix Security Symp., pp. 303- 320, Aug. 2004. [19] J.R. Douceur, The Sybil Attack, Proc. Intl Workshop on Peerto- Peer Systems (IPTPS), Springer, pp. 251-260, 2002. [20] S. Even, O. Goldreich, and S. Micali, On-Line/Off-Line Digital Schemes, Proc. Ann. Intl Cryptology
Conf. (CRYPTO), Springer, pp. 263275, 1989. [21] J. Feigenbaum, A. Johnson, and P.F. Syverson, A Model of Onion Routing with Provable Anonymity, Proc. Conf. Financial Cryptography, Springer, pp. 57-71, 2007. [22] S. Goldwasser, S. Micali, and R.L. Rivest, A Digital SignatureScheme Secure Against Adaptive Chosen-Message Attacks, SIAM J. Computing, vol. 17, no. 2, pp. 281-308, 1988 [23] J.E. Holt and K.E. Seamons, Nym: Practical Pseudonymity for Anonymous Networks, Internet Security Research Lab Technical Report 2006-4, Brigham Young Univ., June 2006. [24] P.C. Johnson, A. Kapadia, P.P. Tsang, and S.W. Smith, Nymble: Anonymous IP-Address Blocking, Proc. Conf. Privacy Enhancing Technologies, Springer, pp. 113-133, 2007. [25] A. Juels and J.G. Brainard, Client Puzzles: A Cryptographic Countermeasure Against Connection Depletion Attacks, Proc. Network and Distributed System Security Symp. (NDSS), 1999.
[26] A. Kiayias, Y. Tsiounis, and M. Yung, Traceable Signatures, Proc. Intl Conf. Theory and Application of Cryptographic Techniques (EUROCRYPT), Springer, pp. 571-589, 2004. [27] B.N. Levine, C. Shields, and N.B. Margolin, A Survey of Solutions to the Sybil Attack, Technical Report 2006-052, Univ. of Massachusetts, Oct. 2006. [28] A. Lysyanskaya, R.L. Rivest, A. Sahai, and S. Wolf, Pseudonym Systems, Proc. Conf. Selected Areas in Cryptography, Springer, pp. 184199, 1999. [29] S. Micali, NOVOMODO: Scalable Certificate Validation and Simplified PKI Management, Proc. First Ann. PKI Research Workshop, Apr. 2002. [30] T. Nakanishi and N. Funabiki, Verifier-Local Revocation Group Signature Schemes with Backward Unlinkability from Bilinear Maps, Proc. Intl Conf. Theory and Application of Cryptology and Information Security (ASIACRYPT), Springer, pp. 533-548, 2005.
HIGH SPEED SoC BY MEANS OF ENHANCING PLC USING VLSI TECHNOLOGY

Mr.S.Nagakumararaj1
1
Asst. prof. RVS college of engg and tech, Dindigul
Abstract
Programmable logic cores (PLCs) offer a means of providing post-fabrication reconfigurability to a SoC design. This ability has the potential to significantly enhance the SoC design process by enabling post-silicon debugging, design error correction and post-fabrication feature enhancement. However, circuits implemented in general purpose programmable logic will inevitably have lower timing performance than fixed function circuits. This fundamental mismatch makes it difficult to use the PLC effectively. We address this problem by proposing changes to the structure of the PLC itself; these architectural enhancements enable circuit implementations with high performance interfaces. In previous work we addressed system bus interfaces, in this work we address direct synchronous interfaces. Our results show significant improvement in PLC interface timing, such that interaction with full-speed fixed-function SoC logic is possible. Our enhanced PLCs are able to implement direct Synchronous interfaces running at, on average, 662 MHz (compared to 249 MHz in regular programmable logic). We are able to do this without compromising the basic structure or routiblity of the programmable fabric. PROGRAMMABLE CORES: LOGIC Before fabrication, the designer embeds a programmable fabric, consisting of many uncommitted gates and programmable interconnects between the gates, onto the chip. After the fabrication, the designer can then program these gates and the connections between them to serve different applications or implement design changes. These configurable logic blocks and connections have also been commonly referred to as embedded FPGAs (field programmable gate arrays), as opposed to stand-alone FPGAs that have been available for two decades. The effective integration of a PLC into a SoC remains a challenge. A key issue is the mismatch in timing performance between programmable and fixed logic [11]. The inherent difference in achievable clock frequency between the two technologies forces the SoC integrator to either reduce the clock rate of the fixed-function logic, or to design rate adaptive circuits to manage the interface between the two types of logic. In most cases reducing the clock rate of the fixed function logic is not desirable. However, the design of rate adaptive interface circuits for this scenario is also a challenge. In many cases the circuits
Todays embedded systems consists of a number of heterogeneous Processing, Communication and sensing Components. These Embedded Components are increasingly being integrated into systems-on-chip (SoC). This SoC contains a general-purpose RISC Processor, Digital Signal Processor, Communication Processors (HDLC.,) and various memory interfaces and I/O Controllers. Programmable logic cores (PLCs) are small regions of an FPGA fabric integrated into a fixed-function SoC PLCs address a number of important SoC design challenges including post-silicon debugging, the correction of design errors, and the addition of new features to an existing fixed-function design. Because PLCs are electrically reconfigurable, it is possible to quickly and cheaply create new circuits in an existing SoC. A programmable logic core (PLC) is a flexible logic fabric that can be customized to implement any digital circuit after fabrication.
to be implemented in the programmable logic are not known at the time of SoC integration, and therefore the interface requirements are also not known. Implementing these rate adaptive interfaces in the regular programmable fabric would provide the required flexibility, however, as we will show, the timing performance would not be sufficient. To address this problem, we propose modifying the PLC to enable high-speed, rateadaptive interfaces. Rather than designing circuitry for a specific interface, we create new programmable structures integrated in the clustered logic blocks (CLBs) of the programmable fabric. We are careful to maintain the basic structure and flexibility of the fabric in order to leverage existing work in FPGA CAD tools and programmable architectures. Our flexible interface structures compliment the functionality of the basic clustered logic block (CLB) structures, and integrate into the standard routing architecture. SoC FRAMEWORK: SoC interfaces can be classified in two groups: 1) system bus interfaces, and 2) direct synchronous interfaces. Most IP blocks interface to one or more system busses. An example of an industry-standard bus is the AMBA AHB. In general, however, SoC system busses do not handle all inter-block communications. Instead, tightly coupled blocks, and especially datapath- oriented blocks, use application-specific synchronous interfaces to communicate. We refer to these interfaces as direct synchronous interfaces. These interfaces are often ad-hoc or proprietary, but an example of an industry standard interface of this type is the PIPE standard which defines the interface between the PCS and PHY layers in PCI Express . Our goal is to ensure that the PLC is able to connect to any SoC block. This will maximize the flexibility of the SoC, by allowing the role of the programmable logic to be defined postfabrication. All of our new programmable
structures were integrated at the clustered logic block (CLB) level. Each modified CLB contained the regular CLB logic, in addition to the new circuitry. Using the shadow cluster approach from [16], when the new circuits are not required they can be disabled, and the CLB can be used as normal. To maintain the regular tile structure of the PLC, we assume all modified CLBs are aligned in continuous columns in the fabric, as shown in Fig. 2. This allowed the columns with modified CLBs to increase in width to accommodate new logic, while ensuring that the standard CLB tiles maintained the normal size and dimensions. The I/O of the new logic in the modified CLB is connected directly to the regular programmable routing fabric of the PLC. This adds important flexibility to our proposal since the SoC interfaces can be connected directly to the regular I/O of the PLC and then routed to the inputs and outputs of the modified CLBs as needed.
STRUCTURE OF PLC
The integrated circuit designer partitions the design into functions that will be implemented using fixed logic and programmable logic, and describes the fixed functions using a hardware description language. The designer obtains an RTL description of the behavior of a programmable logic core. The designer merges the behavioral description of the fixed part of the integrated circuit and the behavioral description of the programmable logic core, creating a behavioral description of the block. Standard ASIC synthesis, place, and route tools are then used to implement the PLC behavioral description from the above step. The integrated circuit is fabricated. The user configures the programmable logic core for the target application. Consider the simplified view of a 3-input lookup table (3-LUT) used in an FPGA. The standard fabric uses SRAM cells to store configuration bits and pass transistors to implement the 3-LUT shown in Fig. (a). In the soft PLC case shown in Fig.(b), a standard-cell library is used to implement the same 3-LUT. In fact, all desired functions of the soft PLC are constructed from NANDs, NORs, inverters, flip-flops (FF) and multiplexers from the standard cell library. The same holds true for the programmable interconnect in the FPGA and soft PLC.
Operations are performed only based on operators. Increased Storage Capacity, because here we are using operators only instead of opcodes. Contains both 64kb ROM and 128 Bytes RAM in a Single IC. This processor is based on HARVARD Architecture.
C O M PA R IS O N :
I N S T R U C T IO N M C O F EX ISTIN G SYSTEM M C OF PROPOSED SYSTEM
ADD B IN R M M o v r,m S TA A D D R O RA REG T O TA L

2 2 -Feb -1 0
4 10 7 13 4 38
1 1 1 1 1 5
35
are able to sample or generate data at the full clock rate of the fixed function logic while managing the rate adaptation to the lower clock rate of the programmable logic core. In many cases the circuits implemented in programmable logic could be designed to maintain the required data throughput by parallelizing the data processing and using a lower clock rate. However, the incoming data must still be sampled using the high-speed clock and a rate adaptation performed to move the data to the lower frequency clock domain. Our direct synchronous interfaces must provide the flexibility to enable many different data formats and many different clock ratios. For instance, the interface may use 32 or 8 bit wide data. The ratio of the clock rates of the fixed function logic and programmable logic must also be flexible. Based on studies comparing the achievable clock frequencies of ASICs and FPGAs, we chose the following target clock ratios for this work: 2:1, 3:1, 4:1, 6:1, and 8:1. MODIFIED CLB IMPLEMENTATION To create our modified CLB structures, we target the data sampling, data generation and rate adaptation portions of the interface logic. By hardening these aspects of the interface we can move all the logic that must operate on the fast clock to application-specific, configurable logic inside the CLB. In our implementation, one set of modified CLBs implements the interface and rate adaptation logic for incoming data, and the other set targets outgoing data. These two CLB types are interleaved on alternating rows of a given column. At the chip level, the arbitrary synchronous interfaces can be connected to the regular I/O of the PLC. These signals are then routed to the inputs and outputs of the modified CLBs as needed. This allows us to support arbitrary I/O data widths. On the other side of the modified CLBs, the signals can be
This peripheral controller consists of a RISC Microprocessor connected by a way of processor bus to Programmable read only memory (PROM) and Random access memory (RAM). The processor bus is also connected by way of bus-to-bus interface logic to an I/O bus. The I/O bus is connected to a number of interface logic circuits which are in turn connected to respective interfaces (interface A and interface B). DIRECT SYNCHRONOUS INTERFACES Tightly coupled design blocks are often connected using design-specific interfaces, rather than standard system busses. We refer to these interfaces as direct synchronous interfaces. In many SoCs, the design blocks on the datapath will communicate this way, while control and configuration operations will occur over a system bus. These direct interfaces often use simpler protocols and operate at higher speeds than system bus interfaces. Rather than using explicit addresses for each data transaction, the identification of the transfer data is often done implicitly, based on the order that the data is received, or is done using in-band signaling, such as packet headers. The timing requirements for direct synchronous interfaces in PLCs are that they
routed to the regular CLBs to implement the normal programmable logic circuits. Each of the incoming-type CLBs contains a configurable FIFO. The FIFO structure requires two clocks: a fast clock and a slow clock. Each of these clocks is routed to the modified CLBs using standard FPGA clock routing. The fast clock is sourced from the same clock that is generating the incoming data. This fast clock is usually generated by a phase locked loop (PLL) on the SoC. The slow clock is a divided version of the fast clock. A common clock divider is used for the entire PLC. This slow clock is also used as the main clock for the programmable logic in the PLC. The slow clock must be frequency locked to the fast clock, however, this will be guaranteed if the fast clock is used as the reference in the clock divider circuit. Each of the outgoing-type CLBs contains a configurable FIFO. The outgoing FIFO requires the same clock structure as the incoming FIFO and supports the equivalent rate adaptation ratios of 1:2, 1:3, 1:4, 1:6, and 1:8. To understand this structure, consider the 8:1 case, the counter will count from 000 to 111. Each of the 8 input values will be multiplexed to the out0 output. Each bit in the count will be used as the select input for a level of multiplexers. In this way, each input (in0-in7) will be selected, in order, only once. When a smaller clock ratio is used, the counter rollover value will be adjusted and a level of multiplexers will be bypassed. For instance, for the 4:1 case the counter rollover will be 011; out0 and out1 will each select in0-in3 and in4-in7, respectively. A 3-bit counter, with a programmable roll over value, is required to manage data transfers for both the incoming and outgoing FIFOs. The value of this counter coordinates the data transfers to ensure that the data ends up in the correct flip-flops depending on the clock ratio selected. In addition to coordinating the data transfers, the counter also controls the start-up condition of the FIFO to avoid metastability when data is transferred between clock domains.
SIMULATION RESULTS Processor output:
Network controller output:
CONCLUSION In this paper we have demonstrated modifications to programmable logic cores (PLCs) that enable the implementation of circuits with high speed interfaces. These changes have minimal area overhead in the PLC, and for circuits that make use of these modifications, there is a significant decrease in overall CLB usage and required routing resources. These changes integrate directly into the regular programmable fabric, thus enabling the reuse of existing FPGA CAD tools.
REFERENCES [1] S. J. E. Wilton et al., Design considerations for soft embedded programmable logic cores, IEEE J. SolidState Circuits, vol. 40, no. 2, pp. 485497, Feb. 2005. [2] S. Phillips and S. Hauck, Automatic layout of domain-specific reconfigurable subsystems for systems-on-a-chip, in Proc. ACM Int. Symp. Field-Programmable Gate Arrays, Feb. 2002, pp. 165176. [3] P. Zuchowski et al., A hybrid ASIC and FPGA architecture, in Proc. IEEE/ACM Int. Conf. on Computer-Aided Design, 2002, pp. 187194. [4] S. J. E. Wilton and R. Saleh, Progammable logic IP cores in SoC design: Opportunities and challenges, presented at the IEEE Custom IC Conf. (CICC), San Diego, CA, 2001. [5] B. R. Quinton and S. J. E. Wilton, PostSilicon debug using programmable logic cores, in Proc. IEEE Int. Conf. FieldProgrammable Technology, Dec. 2005, pp. 241247. [6] M. Abramovici, A reconfigurable designfor-debug infrastructure for SoCs, in Proc. Design Automation Conf., Jul. 2006, pp. 712. [7] S. Sarangi et al., Patching processor design errors with programmable hardware, IEEE Micro, vol. 27, no. 1, pp. 1225, Jan.Feb. 2007. [8] F. Lertora and M. Borgatti, Handling different computational granularity by
reconfigurable IC featuring embedded FPGAs and network-on-chip, in Proc. IEEE Symp. Field-Programmable Custom Computing Machines, Apr. 2005, pp. 4554. [9] P. Magarshack and P. Paulin, System-onchip beyond the nanometer wall, in Proc. Design Automation Conf., 2003, pp. 419424. [10] I. Kuon and J. Rose, Measuring the gap between FPGAs and ASICs, IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst., vol. 26, pp. 203215, Feb. 2007. [11] B. R. Quinton and S. J. E. Wilton, Embedded programmable logic core enhancements for system bus interfaces, in Proc. Int. Conf. Field- Programmable Logic and Applications, Amsterdam, Aug. 2007, pp.202209.
A SURVEY ON TYPES OF MOBILE AUTHENTICATIONS USED IN MOBILE BANKING SECTORS

S.Kalaichezhian 1, K. Purushothaman2, G. Murugavel 3, V. Krishna kumar 4, A. Meiappane5
1,2,3,4
PG Student (Networking), Sri Manakula Vinayagar Engineering College, Madagadipet, Puducherry 605 107.
Associate Professor, Department of IT,Sri Manakula Vinayagar Engineering College, Madagadipet, Puducherry 605 107.
Abstract:
Mobile banking is attractive because it allows people to do banking anytime, anywhere. One of the requirements of performing a mobile banking transaction is that users are required to login before use. The current mobile banking login method is PIN authentication; however, results from other research studies have found that there are usability concerns of using PINs[1]. To overcome some of the concerns, researchers have suggested the use graphical passwords. In this research, we argue that another alternative input technique can be utilized. We explore a novel password input approach, called gesture passwords. This paper explains the survey about the happening in Present Mobile Banking. INTRODUCTION: India has about 207 MM (September 2007 TRAI Data) mobile phone subscribers, a number that is larger than the number of bank accounts or Internet users. Given the mobile tele-density of about 20% and development of secure mobile technology solutions, banks are well-positioned bridge the digital divide and introduce the unbanked sector to the financial mainstream. However to start with , we must understand who the various stakeholders are and what there expectation are: Stakeholders are as follows a) b) c) d) e) f) g) Consumers Merchants Mobile Network operators Mobile device manufacturers Financial institutions and banks Software and technology providers Government to ensure compliance with all financial controls and regulation. Payments can be made by the following a. Savings Bank Account/Debit Card b. Credit Card Account c. Pre-paid Cards d. Virtual Cards (Credit & Debit Cards) Banks role should be of providing normal transactional services to customers using the full range of services including Cash, Savings account, Credit Card, Debit Card and Prepaid Cards services. Transactions should be maintained within the banking network and all the stakeholders in transaction processing and should be subject to equal level of scrutiny and regulation as are other bank accounts. Transaction settlement should ride on the existing infrastructure for efficient settlement and payment systems. Role of Telco Telcos should provide the KYC and customer history for Banks to offer the services to the customer and full responsibility
Role of Banks Any money exchange i.e. Payments, P2P, remittance, etc should be executed through Banking instruments & Infrastructure. This is
for fraud management at their outlet as per TRAI guidelines. In order to ensure Mobile Payments reaches the critical customer mass, KYC documents required to offer financial products should be made similar to Telcos KYC guidelines. Distribution network of Telcos should be used to provide the services of Mobile Payments to maximum possible locations across the country. External low-cost hosting at Telco should be explored Banks will not have to reinvent the technology platform & billing systems for such an offering. Mobile Banking: Mobile banking (also known as MBanking, mbanking, SMS Banking) is a term used for performing balance checks, account transactions, payments, credit applications and other banking transactions through a mobile device such as a mobile phone or Personal Digital Assistant (PDA). The earliest mobile banking services were offered over SMS. With the introduction of the first primitive smart phones with WAP support enabling the use of the mobile web in 1999, the first European banks started to offer mobile banking on this platform to their customers. Mobile banking also used to perform banking transactions or accessing financial services via a mobile device such as a mobile phone. It has revolutionized the banking industry with new business models to offer convenient self-service banking options to their customers. With mobile banking, a client may be sitting in the most remote location, but as long as the client has a mobile phone with network connectivity, the client can access his/her account anytime, anywhere. Mobile Banking refers to provision and availment of banking- and financial services with the help of mobile telecommunication devices.The scope of offered services may include facilities to conduct bank and stock market transactions, to administer accounts and to access customised information."
According to this model Mobile Banking can be said to consist of three inter-related concepts: 1. Mobile Accounting 2. Mobile Brokerage 3. Mobile Financial Information Services Types of Mobile Banking Authentication: In general, a bank account can be accessed via more than one route. Besides mobile banking, bank clients can access their account through an ATM (requires a bank card and an ATM PIN), internet banking (requires a user identification, a PIN or password, and some implementations require a one-time password), and mobile banking. Yet all these channels do not apply the same technique for authentication. Future, some banks require their clients to remember multiple passwords (or PINs) for the same account, each password for a different channel. Banks adopted this approach for safety reasons; if a password of a channel is compromised, then at least the other channels will still be safe because they use different passwords. Although this increases system security, it only benefits the administrator at the bank; from a users perspective, remembering multiple passwords for the same account is chaotic. 1. Locimetric 2. Drawmetric 3. Cognometric Locimetric: Locimetric (or location-based) authentication is a technique where the system provides an image as a memory cue and relies on precise position recall to authenticate. [1]This technique requires a user recalling a password by identifying a series of predefined points on a background image. One of the earliest locimetric authentication techniques, called graphical password, is discussed by Blonder (1996)(Fig.1). The technique is based on image cued recall; it requires a user to touch a set of predetermined areas on an image in a
sequence to authenticate. An example of Blonders graphical password is illustrated in Figure 6. A major problem of this scheme is identified by
Fig. 1: An example of Blonders graphical password (Blonder, 1996) Wiedenbeck et al. (2005a); they identified that the number of predefine click regions are relatively small, so the password must be long to be considered secure. However, usability decreases as the length of a password increases. Wiedenbeck et al. (2005b) proposes a solution called PassPoints which overcomes this problem. The advantages of their solution are: (1) it allows users to submit their preferred images for visual cue, (2) users may choose any click points as a password, and (3) the tolerance size of the clickable region around the password click points can be varied. Locimetric authentication relies on the users ability to remember, to identify, and to click on specific positions on the screen with some level of precision (Renaud & De Angeli, 2004), i.e. the accuracy of finding and pointing at locations on the screen. However, the precision varies amongst users, which is why a tolerance region around the click point is needed. The size of the tolerance region can affect user inputs. Smaller region causes the system to reject more false-negatives, and, conversely, bigger region causes the system to accept more false-positives. This is further verified by Wiedenbeck et al. (2005b); they reported that tolerance regions of smaller than 10x10 pixels seriously impair users memory and increase password input time. Drawmetric
Drawmetric authentication involves the user drawing a simple outline of a password on a grid during enrolment, and the authentication is consisted of reconstruction of the enrolled sketch. For example, verifying the users hand-written signature for authentication. A well-known drawmetric authentication mechanism for mobile devices is Draw-A-Secret (DAS) (Jermyn, Mayer, Monrose, Reiter, & Rubin, 1999)(Fig.2). DAS is intended for devices with stylus input, such as PDAs. The idea behind DAS is the user draws a sketch (the password) on a grid, and the system verifies the drawing by checking the drawn strokes. To verify the sketch, the system converts the users input strokes into a sequence of coordinates, and then the system validates by examining the position and the order of the coordinates. For example, the sketch in Figure 9 is translated to (2, 2), (3, 2), (3, 3), (2, 3), (2, 2), (2, 1), (5, 5); where the last coordinate, (5, 5), denotes a Pen Up event. Two secrets are consider equivalent if their encoding coordinates are the same, not the drawings themselves (Dunphy & Yan, 2007).
Fig. 2: Input of a DAS password on a 4x4 grid (Jermyn et al., 1999) Cognometric: Cognometric authentication is by far the most researched area in graphical authentication, and it is widely recognized by its simplicity to design and to implement. The process requires a user to identify a series of recognized password images amongst a larger set of decoy images; if the set of correct images are identified the user is authenticated.[1]Furthermore, cognometric authentication does not require an external interface for user input; it only requires
standard button input and a display for output to show images. The strength of cognometric authentication is based on recognition rather than recall through exploiting the picture superiority effect of the human mind. A variety of cognometric authentication mechanisms have been developed and many of them use different image types for mnemonics. PassfacesTM by Real User Corporation (2005)(fig.3) is a commercialized cognometric authentication mechanism which exploits peoples proficient ability to recognize human faces.
the cognometric verification model. It follows the same design paradigm as Passfaces (Real User Corporation, 2005), Dj Vu (Dhamija & Perrig, 2000), and VIP (De Angeli et al., 2005), where users identify recognised images amongst decoys to login. Gesture authentication: In recent years, accelerometers have been increasingly integrated into mobile phones (such as Apples iPhone, Nokias N95, etc.). A built-in accelerometer allows the mobile device to sense users movements and, as a result, it provides a new modality for user input. [1]At the moment (2008), there are a small number of mobile applications which make use of this: for example, Williamson et al. (2007) introduced Shoogle, an interface for sensing data within a mobile device as the device is shaken. As more uses of accelerometers are being discovered, we confidently predict that more mobile phones will be equipped with built-in accelerometers in future. Gesture password elements: The gestures used for this study are discrete gestures. A discrete gesture can be distinguished as a movement from a starting position to a stopping position. Here, a discrete gesture is defined as a distinctive singular movement that can be perceived individually and not connected to, or part of another motion. In other words, a motion that cannot be further decomposed as units of actions can be classified as a discrete gesture. Since it has the property of a discrete structure, multiple singular gesture units can be combined to form a string of discrete gestures. A stroke-based authentication system by De Luca et al. (2007) introduced a new concept of using directional strokes in a two-dimensional plane as passwords. They defined a stroke-based password as made up of many horizontal, vertical, and diagonal strokes; the strokes can form specific shapes, and the shapes are suggested as mnemonics. In this study, we apply a similar concept which uses gesture strokes in a three-dimensional space as
Fig. 3: A screenshot of Passfaces demo application (Real User Corporation, 2005) To authenticate, a user selects a recognised face from a grid of nine faces (see Figure 12), and the procedure repeats for several rounds with different faces each round. If the user correctly identifies all the password faces, then the user is authenticated. Graphical authentication: From psychology, we have learnt that recognition of a previous seen item is easier than unaided recall. Following the same logic, graphical recognition-based authentication is predicted to provide better memorability than textual recall-based authentication. [1]The design of our graphical authentication follows
passwords. Ten discrete gestures (illustrated in Figure 22) are defined as password elements. The elements were designed based on the spatial directions of a mobile phone, and the elements were designed with the intention that each gesture must have a symmetrical gesture in the mirror direction. The forward gesture (Figure 22.a), for example, has a symmetrical gesture element backward (Figure 22.b) in the mirror direction. The symmetry is to ease the process of learning the gestures. As users learn a gesture element they can apply the reverse movement to learn the mirror gesture, thus simplifying the learning process for the users. User Interaction: A gesture entry is entered as a movement that starts and stops in the same position. A complete gesture is defined as the change of state from a motionless state to a moving state and then a motionless state to identify the end of the gesture. Consequently, for the system to register a motionless state, the user needs to pause between each gesture elements during password entry. Security: A string is an ordered list of elements in which the same elements can appear multiple times at different positions. In this study, a gesture password is defined as a string of multiple gesture elements. For a user to authenticate, that user is required to produce a string of gesture password elements in the correct order, which means gesture passwords are permutation based. Since the system supports ten different gestures and it uses permutation, gesture passwords have the same password space as standard 10-digit PINs. Thus, in theory, both have the same security. However, gesture passwords have a weakness against shoulder-surfing; as a user enters a gesture password, his/her action can be recorded easily. For this reason, we recommend our gesture authentication to be used in a secure environment. Evolution: Banks are constantly on the search for solutions which will help reduce their cost of
operations and improve customer experience. In this continuous journey, the banking industry has and several innovations delivered. Innovations in banking delivery channels dates back to the introduction of ATMs as a selfservice delivery channel. The ATMs heralded a new era of banking as the concept of selfservice was introduced for the first time. ATMs also marked the entry of anytime banking as customers could now access money from their bank accounts at a time of their convenience. The wave of selfservice continued and the advent of Internet banking introduced the concept of anywhere banking as customers could now access their bank accounts from the comforts of their home or office[4]. SMS alerts. One of the key concerns banks were facing was that of customers did several inquiry transactions on ATMs and this was adding to the burden on the ATM infrastructure. This traffic was particularly heavy during salary days. Banks adopted a solution of proactively communicating account balances and important transactional activity on accounts to customers through a simple SMS. Customers stopped queuing up in front of ATMs for inquiry transactions. Account inquiries. The SMS technology proved simple enough for banks to adopt this as a self-service channel. This model of operation involved customers sending an SMS to a published number of the Bank with a key word and identification information. The customer Research Opportunities: Secured Authenticated Mobile Agent Based Mobile Banking System: A mobile agent is a self-contained software element responsible for executing a programmatic process, which is capable of autonomously migrating through a network. Mobile banking systems are applications that allow users to complete banking transactions on a mobile device. Mobile banking represents a more cost efficient channel for the banks,
allowing them to charge less for transactions, and permitting the consumer to have immediate access to information related to their bank accounts. [2]The major issues in mobile banking system are security and authentication. The challenge is to ensure maximum mobility in mobile phone banking while at the same time providing security for users in a way that will be acceptable to both consumers and banks. In client server architecture, if the connection is lost during the transaction, the user has to send his request once again to get the results. The proposed mobile agent based system overcomes that problem as well as the system is secured and authenticated. Usably Secure, Low-Cost Authentication for Mobile Banking: This user authentication schemes for banking systems implemented over mobile phone networks in the developing world. We analyze an authentication scheme currently deployed by an Indian mobile banking service provider named Eko, which uses a combination of PINs and printed codebooks for authenticating users. As a first step, we report security weaknesses in that scheme and show that it is susceptible to easy and efficient PIN recovery attacks.[3] We then propose a new scheme, jointly developed with Eko, which offers better secrecy of PINs, while still maintaining the simplicity and scalability advantages of the original scheme. Finally, we investigate the usability of the two schemes with a sample of 34 current and potential customers of the banking system. Our findings suggest that the new scheme is more efficient, less susceptible to human error and better preferred by the target consumers. Conclusion: Mobile banking is an recent technique which is emerging in india for past few years. In this paper we have surveyed the various techniques used in the mobile banking and the mobile authentications used for mobile banking. Most of the people in india have their own mobiles and uses mobile banking. So to protect their
bank transactions and account checking, this mobile authentication is been used. In the first days, the bankers gives the 4-5 PIN nos for password and authentication, but it is easy to hack and crack. So, many of the authors has proposed some various authentication methods. In that the graphical authentication and gesture authentications are main methods. Where, it uses graphical properties like face detection, Eye ball detection, and points detection in the screen, etc. So it is not easy to crack the password in graphical form and steel the account. So, these are the operations we have surveyed in this paper. REFERENCES [1] Ming Ki Chong. USABLE AUTHENTICATION FOR MOBILE BANKING. . Thesis presented for the degree of Master of Science, In the Department of Computer Science, University of Cape Town, january 2009. [2] R. PunithavathiK, Duraiswamy. SECURED AUTHENTICATED MOBILE AGENT BASE MOBILE BANKING SYSTEM. European Journal of Scientific Research ISSN 1450-216X Vol.57 No.3 (2011), pp.494-501 EuroJournals Publishing, Inc. 2011 [3] Saurabh Panjwani, Edward Cutrell. USABLY SECURE, LOW-COST AUTHENTICATION FOR MOBILE BANKING. ISBN: 978-1-450302647:http://portal.acm.org/citation.cfm?id=18 37110.1837116. {saurap,cutrell}@microsoft.com. ACM, 2010. This is the author's revised version of the work. [4]http://www.infosys.com/finacle/solutions/th ought-papers/Documents/evolution-mobilebanking.pdf
Designing Micropump Mechanically for Transdermal Drug Delivery

Sandhana Nirmala Kaviya.J 1 Praveenkumar.S 2
1 2
PG Student, Saveetha Engineering College
Assistant Professor,Saveetha Engineering College
Abstract
The main objective of this project is to model an efficient micro pump for Transdermal drug delivery system. Micro cantilevers are used which causes vibrations in order to dispense the drug from the drug reservoir. It avoids presystematic metabolism and therefore only minimum dosage is required. This results in a painless drug delivery and allows effective use of drugs with short biological half-life. Keywords: Transdermal, Reservoir, Micro cantilevers, Metabolism, Drug delivery, Dispense. has been performed to investigate the mechanical strength of micro needles design. The results of the present study provide valuable benchmark and predicted data to fabricate optimized designs of MEMS based TDDS. [1].A maximum back pressure of 28.9 kPa can be measured. [2] 2. DESIGN AND CONSTRUCTION OF A MICROPUMP FOR DRUG DELIVERY APPLICATION: This paper presents design, construction, and evaluation of a micropump for drug delivery applications. The proposed micropump consists of three components: fluidics, electronics, and software. The fluidics component includes a silicone elastic diaphragm, a micro servo, housing and two check valves. The diaphragm is modeled and simulated. The electronics component consists of a microcontroller, a micro switch array, a simple display and a power unit. The software component is written in C. The experiments focus on measurement of flow rate, dosage and duration of operation. A discussion of the performance and capabilities of the developed
1. INTRODUCTION:
Micro pumps are designed for transdermal drug delivery. Micro cantilevers are the most simplified MEMS based devices. Micropumps are actuated using Microcantilevers. They are of significant interest as they have potential applications in every field of science ranging from physical and chemical sensing to biological disease diagnosis. Here Micro cantilever replaces the osmotic pumps .Since osmotic pumps cannot react to all drugs; it does not find a wider application. Micro cantilever can work using various types of drugs depending upon their properties .They are placed in drug reservoir which causes vibrations and let the drugs dispense out of the reservoir through the micro needle into the transdermal region. This paper presents the new design of transdermal drug delivery system. Design, analysis, fabrication and characterization of piezoelectric valve less micro pump and hollow out-of-plane silicon micro needles are presented in this study. The analysis and performance of micro pump has been characterized in terms of actuator deflection and flow rate at different operational parameters. Structural analysis of micro needle
micro
pump
is
also
given.
[3]
Fig.1.3D view of micropump MEMS are exposed to a variety of liquid environments in applications such as chemical and biological sensors and micro fluidic devices. Environmental interactions between liquids and micro scale structures can lead to unpredictable performance of MEMS in liquid environments. In this paper, the mechanical performance of micro cantilevers in liquid environments was investigated through a series of experiments. The mechanical performance of the micro cantilevers was evaluated by periodically monitoring changes in resonant frequency. [5] Micro pump are of special interest in microfluidic research, and have become available for industrial product integration in recent years. Their miniaturized overall size, potential cost and improved dosing accuracy compared to existing miniature pumps fuel the growing interest for this innovative kind of pump. Micro pumps can be grouped into mechanical and non-mechanical devices. Mechanical systems contain moving parts, which are usually actuation and valve membranes or flaps. The driving force can be generated by utilizing piezoelectric, electrostatic, thermo pneumatic, pneumatic or magnetic effects. Nonmechanical pumps function with electrohydrodynamic, electro-osmotic.
Fig.2.Time frames images of micropump with direct contact with skin This paper describes the design, fabrication and experimental results of a new, low cost, high-performance silicon micro pump developed for a disposable drug delivery system. The pump chip demonstrates linear and accurate pumping characteristics .The stroke volume of 160 ml is maintained constant by the implementation of double limiter acting on the pumping membrane. The actuator is dissociated from the pump chip. The technology is based on the use of SO1 technology. The result is a small size chip, suitable for cost-effective manufacturing in high volume. The micro pump chip is integrated into the industrial development of a miniature extremely insulin pump for diabetes care. [7]
3. FABRICATION AND WORKING OF MICROPUMP
Fig.4.Drug particles spreaded inside the micropump using microcantilevers. Micro pump are of special interest in microfluidic research, and have become available for industrial product integration in recent years. Their miniaturized overall size, potential cost and improved dosing accuracy compared to existing miniature pumps fuel the growing interest for this innovative kind of pump. Micro pumps can be grouped into mechanical and non-mechanical devices. Mechanical systems contain moving parts, which are usually actuation and valve membranes or flaps. The driving force can be generated by utilizing piezoelectric, electrostatic, thermopneumatic,pneumatic or magnetic effect s. Non-mechanical pumps function with electro-hydrodynamic, electro-osmotic.
Fig.3.Micropump view This paper presents the fabrication and test of a thermnopneumatic micro pump without membrane for transdermal drug delivery systems (DDS). A micro pump consists of two air chambers, a micro channel and a stop valve process and etc. The meniscus motion in the micro channel is observed with a high speed camera and discharge volume of the micro pump calculated through the frame analysis of the recorded video data[8]
Fig.5.Grid view of micropump
Fig.6.Working of microcantilevers Microelectromechanical Systems (MEMS) have come into existence only in the last decade. Micro cantilevers are the most Organized by: Department of Computer Science and Engineering, Anand Institute of Higher Technology, Chennai. www.iconic12.in E-mail: iconic.aiht@gmail.com [Type text] Page 312
simplified MEMS based devices .They are of significant interest as they have potential applications in every field of science ranging from physical and chemical sensing to biological disease diagnosis. The major advantages of employing micro cantilevers as sensing mechanisms over the conventional sensors include their high sensitivity, low cost, low analyte requirement, and non-hazardous procedure with fewer processing steps, quick response time and low power requirement. Most important is the fact that an array of micro cantilevers can be employed for the diagnosis of large numbers of analysts such as various disease biomarkers of a single disease in a single go thus providing tremendously high throughput analysis capabilities. Their mechanical properties, areas of application as well as a focused section on the use of micro cantilevers in disease detection, an area of enormous potential . On a basic level, these flexible sensors, analogous to tiny diving boards work by measuring the change in deflection or vibrational frequency of the micro cantilever. Their extreme sensitivity, in the order of part per billion or parts per trillion allows them to detect materials at trace levels. Micro cantilever sensors can be used to detect such things as humidity, temperature, herbicides, metal ions, viscosity, missile condition, radiation, biotoxins and explosives with a minimum of fuss, expertise and cost. In the medical field they can detect diseases such as cancer, coronary heart disease and myoglobin as well as making significant contributions to genomics and DNA analysis and blood glucose.
4.RESULTS AND DISCUSSIONS:
Fig.7.Flow rate of liquids in micropump at different frequency level
Fig.8.Inlet and outlet flow rate graph The graphs below describes the flow rate of the drug particle in micro pumps at various frequency level.Micropumps are designed for transdermal drug delivery. Micro cantilevers are the most simplified MEMS based devices.Micropumps are actuated using Microcantilevers.They are of significant interest as they have potential applications in every field of science ranging from physical and chemical sensing to biological disease diagnosis. Here Micro cantilever replaces the osmotic pumps. Since osmotic pumps cannot react to all drugs, it does not find a wider application.Microcantilever can work using
various types of drugs depending upon their properties .They are placed in drug reservoir which causes vibrations and let the drugs dispense out of the reservoir through the micro needle into the transdermal region. 5. CONCLUSION: A skin-contact-actuated micropump is developed and tested substrate. The micropump does not require a voltage power source, since the vibration created by the Micro cantilevers helps to actuate the flow of the Drug particle. The characteristics of the device, such as its ow rate (28.8 L/min) and high back pressure (approx. 28.9 kPa), illustrate its utility as a drug dispenser or pump for transdermal drug delivery. In addition, its low fabrication costs and batteryless operation makes it an ideal singleuse/disposable transdermal drug dispenser. 6.REFERENCES: [1.] Muhammad Waseem Ashraf, Shahzadi Tayyaba, Asim Nisar and Nitin Afzulpurkar .School of Engineering and Technology MEMS Based System for Drug Delivery [2.] Charilaos Mousoulis*, Manuel Ochoa, Demetrios Papageorgiou, and Babak Ziaie, Senior Member, IEEE A Skin-ContactActuated Micropump for Transdermal Drug Delivery
[3.] Abbas Z. Kouzani, Mile Ivankovic, Michael Fielding, AkifKaynak, Chunhui Yang, Wei Duan, Eric J. Hu Design and Construction of a Micropump for Drug Delivery Applications [4.] Guoguang Su and Ramana M. Pidaparti, Member, IEEE, Fellow, ASME Drug Particle Delivery Investigation Through a Valveless Micropump [5.] Shaikh Mubassar Ali, Susan C. Mantell, and Ellen K. Longmire Mechanical Performance of Microcantilevers in Liquids [6.] Lijie Li, Member, IEEE Recent Development of Micromachined Biosensors [7.] Didier Maillefer, Stephan Gamper, Beatrice Frehner, Patrick Balmer Department of Microsystems, DEBIOTECH SA. Lausanne, Switzerland Harald van Lintel, Philippe Renaud Institute of Microsystems, EPFL Lausanne, A High-Performance Silicon Micropump For Disposable Drug Delivery Systems [8.] Sung Rae Hwang', Woo Young Sim', Do Han Jeon1, Geun Young Kim',Sang Sik Yang' and James Jungho Pak2 School of Electronics Eng., Ajou University, Suwon, Korea Fabrication and Test of a Submicroliter-level Thermopneumatic Micropump for Transdermal Drug Delivery
Home Security using Artificial Intelligence in Air conditioners

G.P.Abinaya 1,
1
B.Tech (IT),Anand Institute of Higher Technology,Chennai.
Abstract:
This paper puts forth the implementation of home security by Air conditioners. The security systems on date deal upon providing door level security by finger print technology or password protection. Though this system provides considerable security, once the password is cracked or the security system is destroyed, this may allow break through. I propose a model in which even when there is a break through, security can be provided through Air Conditioners by embedding gene identification technology in it. INTRODUCTION: Artificial Intelligence: The phrase Artificial Intelligence I, which was coined by John McCarthy three decades ago, evades a concise and formal definition to date. One representative definition is pivoted around the comparison of intelligence of computing machines with human beings. Another definition is concerned with the performance of machines which "historically have been judged to lie within the domain of intelligence. None of these definitions or the like has been universally accepted, perhaps because of their references to the word "intelligence", which at present is an abstract and immeasurable quantity. A better definition of artificial intelligence, therefore, calls for formalization of the term "intelligence". Psychologist and Cognitive theorists are of the opinion that intelligence helps in identifying the right piece of knowledge at the appropriate instances of decision making. The phrase "artificial intelligence" thus c bane defined as the simulation of human intelligence on a machine, so make the machine efficient to identify and use the right place of "Knowledge" at a given step of solving a problem. A system capable of planning and executing the right task at the right time is generally called rational. Thus, AI alternatively may be stated as a subject dealing with computational models that can think and act rationally 1, 2, 3, and 4. A common question then naturally arises: Does rational thinking and acting include all possible characteristics of an intelligent system? If so, how does it represent behavioral intelligence such as machine learning, perception and planning? A little thinking, hover, reveals that a system that can reason Ill must be a successful planner, as planning in many circumstances is part of a reasoning process. Further, a system can act rationally only after acquiring adequate knowledge from the real world. So, perception that stands for building up of knowledge from real world information is a prerequisite feature for rational actions. One step further thinking envisages that a machine without learning capability cannot possess perception. The rational action of an agent (actor), thus, calls for possession of all the elementary characteristics of intelligence.
Relating artificial intelligence with the computational models capable of thinking and acting rationally, therefore, has a pragmatic significance. Gene Identification: A general-purpose gene identification program which analyzes genomic DNA sequences from a variety of organisms including human, other vertebrates, invertebrates and plants. Identification of complete gene structures in genomic DNA. Identifies complete exon/intron structures of genes in genomic DNA. GENSCAN [1] has been shown to have higher accuracy than existing methods when tested on standardized sets of human and vertebrate genes, with 75 to 80% of exons identified exactly. The program is also capable of indicating fairly accurately the reliability of each predicted exon. Consistently high levels of accuracy are observed for sequences of differing CG content and for distinct groups of vertebrates. Changes have been made to improve the accuracy of the program for plant sequences. For maize, these changes actually resulted in slightly nucleotide-level accuracy relative to the vertebrate parameter set, but this is more than compensated by the greatly improved exonlevel accuracy numbers. For Arabidopsis, the improvement is more even, with higher accuracy seen in almost every category. <pubmed argument="GenScan">9149143</pubmed> Softberry offers a number of Gene Finding tools for Eukaryota and Bacteria. GeneID Predicts genes, exons, splice sites and other signals along a DNA sequence. Gene prediction based on human, fruit fly, puffer fish, slime mold or wheat. May provide enhancements over GenScan. GenScan Gene modeling, Europe. Gene Builder Gene Builder Gene Structure Prediction System Organism: Homo
sapiens Mus musculus Fugu Caenorhabditis elegans Drosophila melanogaster Arabidopsis thaliana Aspergillus niger Mode: Gene Exon Strand: Direct Complement Sequencing error. GenView TOOLS FOR PREDICTION AND ANALYSIS OF PROTEINCODING GENE STRUCTURE. An integrated computing system for proteincoding gene prediction. Repeated elements mapping. CpG islands prediction. Splicing signals prediction Hamming Clustering.
Existing Home Security Systems: Wireless security system Electromagnetic Motion Sensors Photoelectric Infrared Wireless Acoustic Sensors X10 Digital Security Systems Personal Portable Alarms
TECHNOLOGIES USED: EHFA (Defence Standard 00-250) : Gene Identification. SQL Trigger : Sending SMS Bank Alarm : Signal Police
Inverter
: Supply current to AC, in case of power failure SQL
: Database
Door Sensors : Sense movement of door. Java
the action command of home server. Finally, the mobile robot captures the current contextaware image and then transmits the image information to the user through home server. In the experimental results, we presented that our system has more enhanced performance of response to emergency context and more speedy and accurate path planning to target position for arriving an alarm zone and acquiring the context-aware information.
: Coding the operations
Shockers : Induce current at the door step GSM Technology : Networking System for sending SMS.
EXITING SYSTEM: Home security system [2] based on sensor network (HSSN) configured by sensor nodes including radio frequency (RF), ultrasonic, temperature, light and sound sensors. Our system can acknowledge security alarm events that are acquired by sensor nodes and relayed in the hop-by-hop transmission way. There are sensor network, home security mobile robot (HSMR) and home server in this system. Each device communicates using RF signal and generates context-aware and path planning information by fusing the RF signal and specific sensing information. The home server transmits an event log message to user interface device (UID), PDA or cellular phone, when received a RF data packet from the node that detected an alarm event. At the same time, our mobile robot moves to the alarm zone by
In March 2009, QinetiQ conducted an Early Human Factors Analysis (EHFA) [1] on behalf of the UK Department for Transport (DfT) for the implementation of body scanning technology in the airport security screening context.EHFA is a well-established technique, used extensively as part of UK Ministry of Defence (MoD) [3] acquisition. This was the first time EHFA was used in the security domain. The aim of EHFA is to provide an early indication of where key Human Factors (HF) issues and risks associated with body scanners lie so that mitigation strategies can be developed. EHFA is structured around seven Human Factors Domains ranging from Manpower and Training to System Safety and Human Factors Engineering. PROPOSED SYSTEM: The main objective of this paper is home security. The operation involves the security door with door sensors, port to trigger to AC, database of gene of the people who live in that house and shocker that has been attached to the doors. When a person tries to enter the room through the door, a trigger is send to AC by the movement of the door. This trigger is supplied with inverter or electricity to supply port for the Air conditioner. Once AC is triggered it starts the gene test.
It tests the gene of that person by scanning and if the gene result matches the gene in the database then no action takes place. I use the EHFA technology for Gene Identification (Defence Standard 00-250Human Factor Integration). If the gene result does not match the genes in the database then it is understood that unauthorized person has entered the house and it sends sms to owner and signal to the police. For sending the sms I suggest SQL triggering method/ running the java code to send sms. I use GSM technology to send sms. And to the signal to the police I induce the signaling system used in banks to signal to the police in case of bank robbery. These are triggered when the Gene test fails. In case of them trying to escape before the police could turn up, I provide a shocker near the door. So if the gene test fails, it is understood that unauthorized person has entered and it sends the sms and it activates these shockers. What these shockers do is that, it supplies current from one end of the door to the other end, which is invisible so the people cannot see them. If they try to escape before the police turns up, this keeps them in preventing them from escaping. In case of the system failure and in case if it reads the gene wrongly and this shocker is activated even to the member of the house, obviously they cannot see them so I place a small light near the shocker to indicate whether it is on or off. I use java coding to implement the above explained procedure and to connect each operation.
CONCLUSION: I would like to conclude by saying that this type of security is essential and better off the home security using robotic cameras due to detection process involving Gene Identification, not only to detect or identify or notify the crime scene but also to capture the thief. REFERENCE: 1. Body Scanning Technology: An Early Human Factors Analysis, Security Technology, 2009. 43rd Annual 2009 International Carnahan Conference, Celine Lilliane Aurore Jacques. 2. Home Security Robot based on Sensor Network, SICE-ICASE, 2006. International Joint Conference, YoonGu Kim. 3. Ministry of Defence- Defence Standard 00-250(Issue 1-publication 23 May 2008) 4. http://www.siccolo.com/Articles/SQL Scripts/how-to-create-trigger-to-sms 5. http://www.hmtech.info/security/
Evolution of Enterprise Cloud Computing

1,2,3,
D.Kanagalatchumy1, S.Nandhini2 , D.Punitha3. A.Ramalingam4 PG Students,Sri Manakula Vinayagar Engineering College,Pondicherry. 4 Professor , Sri Manakula Vinayagar Engineering College,Pondicherry.
Abstract The new emerging computing paradigm is cloud computing. Cloud computing is an versatile technology that support broad-spectrum applications. Benefits of cloud computing like low cost, availability and scalability render small companies to deploy cloud. However cloud is not well addressed in in providing complex enterprise product. In this paper we discussed about the risks of cloud computing and the evolution of Enterprise cloud computing. We also discussed about the key challenges and solutions of Enterprise cloud computing
INTRODUCTION Cloud computing is a On-demand network access to share the resources with minimal management. The benefits of cloud computing includes zero installation, automatic configuration, cost-savings and immediate access in scalable data centers. The three services provided by cloud are Saas [Software as a service] which provide applications in cloud Paas [Platform as a service] which provides higher level APIs and Iaas [Infrastructure as a service] which offer computer infrastructure as a service. Though cloud has many advantages it has many risk factors like security and reliability. Cloud computing services could not provide the support required by large enterprise. In this paper we discuss about the various challenges of cloud computing and the evolution of enterprise cloud to overcome those challenges.
CLOUD BENEFITS AND RISKS Benefits Agility, Adaptability, and Flexibility: The benefit of discussing about cloud computing is how it enables agility. It is used to describe two different kinds of benefit; both are real and powerful. We can define the agility in term of business as the ability of a business to adapt rapidly and cost-efficiently in response to changes in the business environment. A business group that wants to implement a new service can purchase cloud services and use, instead of buying servers, installing and deploying the application to new severs. Cloud computing is adaptable such that purchasers of Iaas capacity can run on any applications on variety of virtual machines. Service providers have made their own services that can make the development and deployment easier and faster. Saas capacities are getting easy to get our requirements. Cloud computing is much flexible to quickly add computing capacity to handle temporary rise and fall in requirements. Cost savings: There is an assessment that cloud computing can reduce the cost, which have shown clearly for small to medium-size business. Cost
savings with some of Saas developments can be a factor of propelling enterprise cloud adoption. Transient requirements such as unanticipated workload spikes gets relatively low upfront cost of Iaas and Saas services. An additional advantage is that businesses pay only for the resources they gets. Risks The event that makes the cloud computing, so attractive with its services makes also leads to many risks. Security and Privacy: The advantages of cloud computing such as flexibility and others also introduce the concern that people may use cloud computing in a way that puts their information and intellectual property at risk. In cloud computing, data is stored and delivered across the Internet. The owner of the data does not control and know the location of the data. There is a very real possibility that the owners data could reside on the same resources as a competitors application and data. In multi-tenant environment, it may be difficult for a service provider to assure that data has been given to only single customer. Sometimes, it would be difficult or even impossible to use a public cloud for applications that handle controlled technologies, due to the risk of potential compromises and concerns about compliance. Standards for maintaining security and managing service-level agreements (SLAs) that could be used to help ensure compliance with government regulations and Intel standards are lacking. In many countries, the use and storage of personal data is heavily regulated; without needed safeguards, this data could be illegally exposed with an external cloud. Applications running on internal environment are easy to verify whereas outside the environment are difficult. As a result we might not able to find whether we are having up to date information, in a secured way.
Enterprise Support and Service Maturity: Cloud computing services may not provide reliability, manageability and support required by large enterprises and so many services are aimed primarily at SMBs and at consumers, rather than large enterprises. Service level agreements offered by some providers may be inadequate for some enterprise applications and are not always clearly defined. Cloud computing implementations of some services may lack some of our expectations. Problems encountered with synchronizing e-mail, calendars, and address lists with our existing enterprise application and Security was also an issue. Return on Investment Concerns: From the normal view, external cloud computing can reduce costs for large enterprises as well as SMBs. But the cost for large enterprise may not be much clear as small medium-business, since they get the benefits of significant economies of scale in their own internal IT operations. While cloud computing initially appears to be less expensive in terms of upfront costs, then the comparison between total cost of ownership (TCO), including recurring costs, and potential risks may be much competitive. Movement to an external cloud may perform significant changes or additions to the enterprise network in order to provide acceptable performance to corporate users in regions with limited bandwidth. Upcoming world may need increase in bandwidth of cloud and in many countries it is still expensive. EVOLUTION OF ENTERPRISE CLOUD
COMPUTING
Enterprise cloud computing is used to perform common business related functions which eliminate the need for On-Site Servers and hardware. The company also needs high skilled professionals to maintain the network which is quite expensive. With Cloudcomputing these network tasks are made less expensive. Third-party company will pay only for what they use. Cloud-computing
applications are based on the internet which makes accessibility easy from anywhere. Enterprise cloud computing is, additionally a valuable resource for companies of all sizes. The application associated with the cloud can be dramatically scaled up or down depending on the companys need at any time. The employees have access to some current information despite their availability anywhere in the world. The cost, convenience and reliable capabilities of enterprise cloud computing makes it more popular to all kinds of business.
the responsibility of IT organization to identify the right location of computing. It may be internal or external cloud. Standards: Each cloud is provided with standards to enable the cloud work as a single entity. This will introduce the risk of getting locked into specific clouds in an enterprise. To identify the cloud that is masked cloud computing standards are required to identity, authentication, federation and encryption.
Fig 3: Switching Enterprise with Public and Private Fig 2: Enterprise cloud computing A. Migration to External cloud computing Composite Applications: Composite applications from multiple services from external suppliers and internal IT sources must be supported by the cloud computing. The application might be accessed from several sources: A directly accessed Internetlocated service An internet-located service accessed indirectly through an address managed in an internal cloud. A directly accessed intranetlocated service. The service may have chain of sub-operations but cloud make this transparent to users as working from single invocation address. It is External and Internal clouds: As external cloud have many technical and legal issues its better to start with internal clouds before migrating to an external cloud. IT organizations need to balance three broad areas of computing during the transition to external cloud. Current, conventional computing Internal cloud External cloud Conventional computing: Gradual migration of application to internal and external clouds is conventional computing which provide continuous capabilities for many years.
Internal cloud: Internal cloud also has the characteristics of external cloud. It was the same technologies to provide dynamic infrastructure that responds to demands. They also validate whether the IT organization can migrate to the application to the external cloud. The standards for application are developed for both internal and external cloud which will be useful to migrate from internal to external cloud. The migration should not affect the users at any level. Enterprise infrastructure might be supported by single internal cloud with multiple data centers. The internal cloud can also be subdivided for business continuity. Ext ern Inter al net Inte rnal An clo internetclo ud located ud Ext ern al clo ud Us A er
premises and public cloud overcome lock in by different SLA and different kinds of integration and different cost structures. Standardized APIs are more recommended to mitigate this danger. Standardized cloud helps in software innovation in enterprise scale. Data security Common challenge in cloud computing is security. Though its no matter where the hardware which serve the application is located, its important to configure the architecture properly to mitigate security risk. To architect hosting properly enterprise prefers to provide the list of IP addresses. Other approaches to make cloud stored data more secure is to encrypt data before it is placed in cloud rather than having unencrypted data in local data center. Finally Hybrid on-premises also mitigate the security risk. Data transfer bottlenecks The next challenge is migrating large amount of data which is more complicated. One alternative to virtual data transfer is ship disks which are considered to be the cheapest and fastest method to transfer large amount of data.
service accessed indirectly through an address managed Fig 4: External and in an internal cloud External cloud:
A directly accesse d Intrane tlocated service
direct ly acces sed inter netlocat ed
CONCLUSION
Cloud computing is the growing trend but still it has some challenges which can be addressed with hybrid approach of enterprise cloud computing which offers on demand computing through combination of onpremises and external-based system. Cloud computing have significant benefits, but there are security, privacy and other barriers which prevent adoption of Enterprise cloud computing. Cost benefits for Enterprise Cloud computing is not yet clearly demonstrated VI. REFERENCE [1] Yi Jiao, Lin Li; Nanrong Ye, Towards a lightweight SOA
Internal cloud servi ce
Conventional enterprise compute needs are delivered significantly by external clouds.

ENTERPRISE CLOUD COMPUTING SOLUTIONS
Lock-in danger Lock-in refers to having trouble in getting back data from third party provider. The main computing is controlled by third party and we no longer have the freedom to innovate with software or platforms. Now on
framework for enterprise cloud computing, Computer Supported Cooperative Work in Design (CSCWD), 2011 15th International Conference on, 2011. [2] Lehman, T.J.; Vajpayee, S., We've Looked at Clouds from Both Sides Now, SRII Global Conference (SRII), 2011 Annual, 2011. Dillon, T.; Chen Wu; Chang, E., Cloud Computing: Issues and Challenges Advanced Information Networking and Applications (AINA), 2010, . Carroll, M.; van der Merwe, A.; Kotze, P. , Secure cloud computing: Benefits, risks and controls, Information Security South Africa (ISSA), 2011. Shaikh, F.B.; Haider, S. , Security threats in cloud computing, Internet Technology and Secured Transactions (ICITST), 2011. Minqi Zhou; Rong Zhang; Wei Xie; Weining Qian; Aoying Zhou , Security and Privacy in Cloud Computing: A Survey Semantics Knowledge and Grid (SKG), 2010 . Jansen, W.A. , Cloud Hooks: Security and Privacy Issues in Cloud Computing, System Sciences (HICSS), 2011.
[3]
[4]
[5]
[6]
[7]
SELECTIVE HARMONICS ELIMINATION (SHE) IN SINGLE PHASE PULSE WIDTH MODULATION (PWM) RECTIFIER
1
Abstract
Mrs.S.Pandeeswari 1 Asst. Prof.RVS College Of Engineering And Technology,Dindigul
This paper deals about the GA based Selective Harmonics Elimination (SHE) in single phase Pulse Width Modulation (PWM) rectifier. Since the locations of the switching instants in the case of Selective Harmonics Elimination involves solution of a set of nonlinear transcendental equations, the problem is redrafted as an optimization problem and Genetic Algorithm (GA) is used to solve that set of equations. These equations are formulated to eliminate or reduce third and fifth order of harmonics. The single phase rectifier circuit simulation is done in MATLAB. The harmonic spectrum analysis of the source current waveform using MATLAB confirms the reduce of third and fifth order harmonics along with the desired dc output voltage. Index Terms SHE-Selective Harmonics Elimination, PWM-Pulse Width Modulation, GA-Genetic Algorithm I.INTRODUCTION Several techniques have been proposed to mitigate the lower order harmonics exhibited in the output voltage of pulse width modulation (PWM) rectifier. If the application is sensitive to the switching losses, but can tolerate harmonics to a certain extent then low switching frequency based pulse width modulation (PWM) is preferred. If the application is harmonic sensitive and that can tolerate switching losses to a certain extent, then high switching frequency based PWM is preferred. The traditional carrier based sinusoidal PWM (SPWM) and the space vector PWM (SPWM) are examples of the high switching frequency category. Selective Harmonic Elimination PWM (SHEPWM),Optimal Minimization of the Total Harmonic Distortion (OMTHD) and Optimized Harmonic Stepped Waveform (OHSW) are examples of low switching frequency category, in which the switching frequency is typically around the fundamental frequency of the output voltage of the rectifier. SHEPWM technique has been applied to rectifier whose output voltage waveform has quarter wave symmetry and lower order harmonics were eliminated by implementing transitions at controllable points on the time axis. The higher harmonics can be eliminated by using passive filters or by the inherent low pass nature of electrical machines. The switching instants on the time axis have to be obtained as a solution of a set of nonlinear transcendental equations which can be obtained by two proven approaches. In the first approach numerical iterative techniques such as the Newton Raphson method were proposed. Besides requiring an educated initial guess, such methods pose the problem of possible divergence. While the MATLAB based fsolve function can be used to find all the roots, it calls for additional hardware resources for real time control. The mathematical theory of resultants is more complicated and time consuming and has to be repeated for every voltage level in the operating range. Further the Homotopy a algorithm has been used for the purpose. All the above methods do not provide a generalized and optimized solution when the modulation indices are to be varied. The second approach views the situation as an optimization problem rather than an analytical problem. This approach involves changing the analytical equations into an objective function to be minimized. Various optimization techniques have been adopted for minimizing the objective function. This optimization approach renders solution for cases where it is possible to completely eliminate the low order harmonics and also supplies optimum solution where there is no feasible solution that can be found otherwise. Evolutionary search algorithms like GA, Hybrid technique of GA with Lavenberg Marquardt and Particle Swarm Optimization (PSO) have been suggested for this purpose. So the second approach aims to eliminate the lower order harmonics or at least to minimize them. So it can be considered as a generalized form of harmonic elimination.
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) In this paper the applicability of the Genetic Algorithm (GA) for minimizing the selective lower order harmonics along with the desired source current waveform of the single phase rectifier is presented. This stands out as a promising solution that can perform as good as and even better under certain conditions over the above mentioned techniques in terms of versatility in its usefulness for all ranges of reference voltage, speed, precision and also it is simple to use. Moreover by combining with Simulated annealing algorithms, GA can be used for solving this kind of problems, and the global optimum solution may be obtained. II.GENETIC ALGORITHM A Genetic Algorithm (GA) is a procedure used to find approximate solutions to search problems through application of the principles of evolutionary biology. Genetic algorithms use biologically inspired techniques such as genetic inheritance, natural selection, mutation, and sexual reproduction (recombination, or crossover). Along with genetic programming (GP), they are one of the main classes of genetic and evolutionary computation (GEC) methodologies. Genetic algorithms are typically implemented using computer simulations in which an optimization problem is specified. For this problem, members of a space of candidate solutions, called individuals, are represented using abstract representations called chromosomes. The GA consists of an iterative process that evolves a working set of individuals called a population toward an objective function, or fitness function. Traditionally, solutions are represented using fixed length strings, especially binary strings, but alternative encodings have been developed. The evolutionary process of a GA is a highly simplified and stylized simulation of the biological version. It starts from a population of individuals randomly generated according to some probability distribution, usually uniform and updates this population insteps called generations. Each generation, multiple individuals are randomly selected from the current population based upon some application of fitness, bred using crossover, and modified through mutation to form a new population. Crossover exchange of genetic material (substrings) denoting rules, structural components, features of a machine learning, search, or optimization problem
Figure.1 Flowchart of Genetic Algorithm Selection the application of the fitness criterion to choose which individuals from a population will go on to reproduce Replication the propagation of individuals from one generation to the next Mutation the modification of chromosomes for single individuals This chapter begins with a survey of GA variants: the simple genetic algorithm, evolutionary algorithms, and extensions to variable-length individuals. It then discusses GA applications to data mining problems, such as supervised inductive learning, clustering, and feature selection and extraction. It concludes with a discussion of current issues in GA systems, particularly alternative search techniques and the role of building block (schema) theory. III.APPLICATION OF GA TO SHEPWM A standard single phase rectifier shown in figure.2 is used to generate the source current waveform shown in figure.3 and output voltage waveform shown in figure.4.
In these equation (3) and (4) to substitute in equation (2) get the desired equation
Figure.2 Single phase full bridge rectifier.
Given a desired fundamental source current is elimination 3rd and 5th order harmonics, the problem here is to determine the switching angles 1=2-1, 2=-3-3=-23 and 3=1-+2=2-1 such that
Figure.3 Source current waveform
Figure.4 Output voltage waveform There are four switches in the circuit; hence it has 24=16 different possible combinations of switching. Only four of these combinations are useful for obtaining an alternating waveform across the load. Thus there are only three possible states for the load voltage Va-b. A number of periodic waveforms can be generated using three states. The Fourier series expansion of the output voltage waveform shown in figure.4 is given as
To eliminate the 3rd and 5th order harmonics i3 = 0 and i5 =0, the following equation must be satisfied.
The Fourier series expansion of the source current waveform shown in figure.3 is given as
Due to symmetry of Input current waveform, there will be no even harmonics and Idc should be Zero and the Co-efficient,
The equations (6), (9) and (10) are used to write the MATLAB m-file coding. This m-file coding is used on GA optimization tool box. In this optimization tool box the parameter values are given as, Population size =1400, Fitness scaling is Propositional, Selection is Roulette, Reproduction of values are Elite count=400, Crossover rate: 0.8, Mutation values are Adaptive feasible, Mutation rate is 0.01 and Cross over is Heuristic. The GA optimization tool box result is displayed. The bar chart of the current best individual value and fitness values are also displayed. The corresponding values are displayed in final point of the optimization tool box. Above process is repeated up to the voltage from 0v to 21.65v. Then the final points are tabulated. IV EXPERIMENTAL VERIFICATION An experimental verification setup was constructed to validate the proposal. The purpose was to verify the results obtained from
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) the actual rectifier constructed, drive the rectifier switches with timings derived from the genetic algorithm optimization scheme and to check the harmonic contents of the source current of the rectifier. The single phase rectifier was constructed using four IGBT switches with an input AC voltage of 230v, 50Hz frequency and resistive load is connected. The MATLAB software is used to simulate single phase rectifier circuit. Here I simulate Vdcref=8v value.
Figure.5 source voltage, switching pulses and output voltage waveforms.
Figure.4 Source voltage and source current waveforms. Figure.6 Harmonic spectrum of source current waveform. SI.No Vdcref GA Model 2 3 0.81247 1.55165 0.83972 0.86733 0.92552 0.92665 0.73194 0.50256 1.03659 0.52703 0.53649 1.53248 1.5132 1.52919 1.47593 1.2337 0.93295 1.42221 0.8378 0.81395 MATLAB Model I3 I5 1.6017e-005 -1.0431e005 -1.4106e1.8408e-006 005 -3.1991e4.5663e-007 006 -1.1514e-1.5205e005 005 -9.2666e-1.5777e006 005 6.9964e-006 3.7978e-006 2.5142e-005 0.0127 -2.5163e006 -4.2469e006 -6.8734e006 0.0107 5.5654e-006 4.2648e-005 Error Vdcref-Vdc -0.0002 -0.0001 0.0002 0 -0.0003 0 0 0.0056 0.0001 -0.0001
1 1 2 0.7199 9 2 4 0.6543 2 3 6 0.5881 5 4 8 0.5152 5 5 10 0.4535 2 6 12 0.3811 2 7 14 0.3342 7 8 16 0.2498 5 9 18 0.2007 7 10 20 0.0807 8 V CONCLUSIONS
Vdc 2.0002 4.0001 5.9998 8.0000 10.000 3 12.000 0 14.000 0 15.994 4 17.999 9 20.000 1
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) In this project GA based Selective Harmonics Elimination (SHE) using single phase Pulse Width Modulation (PWM) rectifier circuit has been implemented. Since the locations of the switching instants in the case of selective harmonics elimination involves solution of a set of nonlinear transcendental equations, the problem has been redrafted as an optimization problem and Genetic Algorithm (GA) is used to solve that set of equations. These nonlinear transcendental equations are formulated to eliminate third and fifth order of harmonics. The single phase rectifier circuit has been simulated using MATLAB. The harmonic spectrum analysis of the source waveform current using MATLAB confirms the elimination of third and fifth order harmonics. This project can be extended to more number of harmonics elimination. REFERENCES 1. F.Jafari, A.Dastfan Optimization of Single-phase PWM Rectifier Performance by Using the Genetic Algorithm International Conference on Renewable Energies and Power Quality (ICREPQ10) Granada (Spain), 23rd to 25th March, 2010 2. Vargese Jegathesan, Jovitha Jerome NonTraditional Method-Based Solution for Elimination of Lower Order Harmonics in Voltage Source Inverter Feeding an Induction Motor Drive SERBIAN JOURNAL OF ELECTRICAL ENGINEERING Vol. 5, No. 2, November 2008. 3. Li Li, Dariusz Czarkowski, Yaguang Liu, and Pragasen Pillay, Multilevel Selective Harmonic Elimination PWM Technique in Series-Connected Voltage Inverters IEEE transactions on industry applications, vol. 36, no. 1, January/February 2000. 4. Jos R. Rodriguez, Juan W. Dixon, Jos R. Espinoza, Jorge Pontt, and Pablo Lezana PWM Regenerative Rectifiers: State of the Art IEEE transactions on industrial electronics, vol. 52, no. 1, February 2005. 5. Jos R. Espinoza, Gza Jos, Johan I. Guzmn,Luis A. Morn, and Rolando P. Burgos, Student Member Selective Harmonic Elimination and Current/Voltage Control in Current/Voltage-Source Topologies:A Unified Approach IEEE transactions on industrial electronics, vol. 48, no. 1, February 2001. 6. Joachim Holtz Pulse width Modulation-A Survey IEEE transactions on industrial electronics, vol. 39, Table.1 List of switching angles Organized by: Department of Computer Science and Engineering, Anand Institute of Higher Technology, Chennai. www.iconic12.in E-mail: iconic.aiht@gmail.com [Type text] Page 328
A Hybrid model using Adaboost and GMM-EM for solving Outliers

Raju R[1] Assistant professor B.Tech Information Technology Sri Manakula Vinayagar Engineering College Pondicherry, India Kayalvizhi Devakumaran [2], Aishwarya Balasubramanian[3], Gayathri Balan [4] B.Tech Information Technology Sri Manakula Vinayagar Engineering College
Abstract
The emerging concepts of Artificial Neural Network are used to recognize patterns, manage data and learn. ANN has its own significance in the field of medicine. This paper provides support in diagnosing and categorizing the different stages of the renal disorder. We put forth the idea of defining a Hybrid model which is a combination of Gaussian Mixture Model with the Expectation Maximization algorithm and Adaboost technique. The GMM is used for clustering of similar data and the EM algorithm is used for allocation of the weights based on the input parameters. The Boosting technique Adaboost is being handled in order to boost the classifier into a stronger one. The drawback of handling noisy data i.e. outliers is being resolved by using this hybrid model. This mixture model is well suited for handling the problem of outliers. The system is trained with 75% of the renal samples and the remaining samples are used for testing. As a result, the system provides a probabilistic function which fits the sample in any one of the classified GMMs. Keywords-ANN Artifical Neural Network, GMM Gaussian Mixture Model, EM Expectation Maximization, Adaboost, Outliers.
XII. INTRODUCTION
Artificial neural networks are inspired by the functioning of the brain. They can be used to recognize patterns, manage data and learn [1]. ANN is an adaptive system that changes its structure based on external or internal information that flows through the network during the learning phase. The greatest advantage of ANNs is their ability to be learn from observed data. However, using them is not so straightforward and a relatively good understanding of the underlying theory is essential. It requires proper choice of model which depends upon the data being used and the application for which it is being used, proper learning algorithm that will be best suited for the application being utilized [1]. The robustnessdepends on both the appropriate selection of the model and the learning algorithm. In real time ANN has a wide diversity of applications such as gaming and decision making, pattern recognition, medical diagnosis and financial applications. In medical field they find their significance in diagnostic systems, biomedical analysis, image analysis, drug development. In this paper we tend to focus on diagnosis regarding renal disorders.
The primary role of the kidney is to remove metabolic waste and to balance the water and electrolyte levels in the blood [9]. The kidney also plays a major role in regulating levels of various minerals such as calcium, sodium, and potassium in the blood [8]. Renal disorder can be classified into three stages. The stage I is the start of renal disorder i.e. slightly diminished functioning of kidney with abnormalities in blood or urine. The stage II is the chronic renal failure where the functioning deceases gradually over time. The stage III is the end stage renal failure where the patient must undergo dialysis or must plan for a transplant for their survival. The system involves classification based on the different stages of the renal failure, for which the Gaussian Mixture Model is being utilized. This model is used in clustering similar data into a single entity. Each stage of the renal disorder is classified into 3 stages [9] which are represented as GMM 1, GMM 2 and GMM 3. Even though the system has been categorized into 3 GMM it still remains to be a weak classifier. It can be considered as a weak classifier due to the fact that the system can also be exposed to inaccurate data. This
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) problem can be resolved by boosting the system in order to make it as a strong classifier. Adaboost is one of the most prominent boosting techniques among the ensemble learning to convert a weak classifier into a strong classifier [2]. The greatest advantage of using adaboost technique is that, it can be combined with any different classifier [6]. Adaboost is termed to be adaptive in the sense that subsequent classifiers built are tweaked in favor of those instances misclassified by previous classifiers. Adaboost overcomes the problem of over fitting [3]. But still it holds back on noisy data i.e. the problem of outliers [4].
XIII. RELATED WORK (i) Initialization: The distribution parameters such as mean and variance are evaluated at each of the GMM levels. (ii) Expectation: The evaluation of weight factors of each sample is done based on the current parameter values. (iii) Maximization: Here the process of re-estimation of the parameters using the weights calculated in the previous step is done. (iv) Iterate: Here the likelihood is re-evaluated and checked for accuracy, if the change is less than the considered threshold then the parameters are set else the process iterates C. ADABOOST
This section briefly describes about the various concepts in the training process such as Gaussian Mixture Model with Expectation Maximization Algorithm and Adaboost technique.
A. GAUSSIAN MIXTURE MODEL(GMM)
Gaussian Mixture Model is considered as one of the methods for clustering that helps in building soft clustering boundaries [7]. It can also be referred as a probability density function which considers the weights. The parameters for Gaussian Mixture Model are estimated by training the samples using the iterative Expectation-Maximization Algorithm. A Gaussian Mixture Model (GMM) is a parametric probability density function represented as a weighted sum of Gaussian component densities. GMM are commonly used in biometric system.
B. GMM-EM algorithm
Boosting is one among the ensemble system for decision making. Seeking an additional opinion before making a sound decision is an innate behavior prevailing in the medical domain. In general a classifier or an expert is used to make a decision by picking any one of the option from the previously set of options [1]. In order to provide a sound decision making system a variety of boosting techniques are available. This paper puts forth the concept of using Adaboost technique Adaboost is an adaptive boosting technique that improves the classification accuracy [6]. It can be used with different classifiers in order to obtain a strong classifier [2]. The main significance of using this technique is that it overcomes the problem of over fitting [3]. It can use data that is textual, numeric, images, etc. The adaboost algorithm converges to the logarithm of the likelihood ratio. Adaboost is sensitive to outliers. Outliers can be termed as those points which seem to deviate from other samples from a given data set. Outliers can also be termed as an extreme observation in a given dataset. They may also include sample maximum or the sample minimum. However sometimes they seemed to be invalid because the sample maximum and minimum may always not be far from the other observations. This drawback is being resolved by our proposed system [4].
XIV. PROPOSED WORK
The Gaussian Mixture Model with ExpectationMaximization Algorithm (GMM-EM) is an efficient model for solving the parameter estimation problems [7]. The EM algorithm alternates between finding a greatest lower bound to the likelihood function(the E Step"), and then maximizing this bound (the M Step"). The EM algorithm is an iterative method for finding the maximum likelihood which involves two steps Expectation step Maximization step
The Expectation (E) step computes the current weight estimate for the parameter [5]. The Maximization (M) step computes parameter that maximizes the weights estimated on the E step. The EM algorithm steps are discussed in detail as follows
The proposed system helps the physicians in diagnosing the current stage of renal disorder based on the patients vital records which serves as input to the system [8]. The sample data required for training the system is obtained from Apollo Specialty Hospital, Chennai, India. The input parameters considered for the system are as follows
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) 1) 2) 3) 4) 5) 6) 7) 8) Urea Creatinine Potassium Sodium Uric acid Phosphorus Protein Total Albumin
Where, L Log Likelihood X-Dataset W- Weight M Mean V-Variance U-mean(X) S-covariance of X
The architectural framework of the proposed system is as shown in the Fig 1
(1)
The formula for Mean (M(:,i)) is given in (2), M(:,i) = E(j,i)*X(j,:); Where, X-Dataset E-expectation Fig 1: Architectural Design The functioning of the above depicted system is described briefly in two phases namely the Training phase and the Testing phase.
A. Training Phase
(2)
The formula for Expectation (E) is given in (3), E = ( exp(-0.5*dXM'*iV(:,:,j)*dXM)/(a*S(j));) *W where, W - weight dXM - difference between mean (initial) and weight S(j) = sqrt(det(V(:,:,j))); (3)
75 % of the sample data are subjected to training which undergoes the following process. Firstly the process of data extraction from the dataset takes place [9]. Feature identification [5] is done from the extracted data. The identified features are subjected to the GMM generator for the purpose of classification [7]. The mean (M), variance (V) and weight (W) for the model () is computed with the k-means clustering technique. The initial log likelihood is calculated to enhance the computational speed. The formula for Log Likelihood (L) is given in (1), L = L + W(i)*(-0.5*n*log(det(2*pi*V(:,:,i))) ... -0.5*(n-1)*(trace(iV*S)+(U-M(:,i))'*iV*(UM(:,i))));
The formula for Variance (V(:,:,i)) is given in (4), V(:,:,i) = E(j,i)*dXM*dXM' / W; where, W - weight
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) dXM - difference between mean (initial) and weight E-Expectation (4) order to overcome this we need to boost the system to make it a strong classifier. For this purpose the Adaboost technique has being used which combines all the weak classifiers into a strong classifier [2]. The problem of outliers cannot be handled by the Adaboost technique, so in order to overcome this drawback we propose a hybrid model which is the combination of Gaussian with the Adaboost which can easily resolve the outliers.This training method makes the system to learn rather than to memorize [3]. If the sample data contains any outbound data or noisy data [4] they are resolved by this system. As a result the system is trained to handle renal samples.
B. Testing Phase
Expectation-maximization (EM) algorithm is a method that iteratively estimates the likelihood [10]. It is similar to the k-means clustering algorithm for Gaussian mixtures since they both look for the center of clusters and refinement is done iteratively. For a given data set of X, the best fitting model () is the one that maximizes. The algorithm of Expectation and Maximization is as follows 1. Given an initial model () with mean (M), variance (V), weight (W) and Log Likelihood (L). 2. Find the expectation, E for the model () with M, W, and V 3. Compute the new model () using the expectation, E. 4. Calculate the likelihood (L) for the new model (). 5. Repeat Step2, if abs (L-L)/L > 0.1 & iteration count 1000. 6. The best fitting Gaussian Mixture Model () is arrived for the data set Here, the samples are classified into 3 different models based on the functioning of the kidney. GMM 1 - The initial stage is the slightly diminished functioning of kidney with abnormalities in blood or urine tests. This stage is differentiated as GMM 1 of the probability density function. GMM 2 - The next stage is the chronic renal failure where the functioning of the kidney deceases gradually over time maybe from months or years. This stage is represented as the GMM 2 of the probability density function. GMM 3 - The final stage is the end stage renal failure where the survivability of the patient is possible by treating them with dialysis or kidney transplantation. This stage is noted as GMM 3 of the probability density function. The mean and variance of each dimension (i.e. in each stage) has been calculated for each GMM [7]. Even though the system has been classified, the system still remains to be a weak classifier because of the fact that the system can also be subjected to inaccurate data. So in
The remaining sample data are used for the testing process. The test data is fed to the predictor along with the three GMMs. Predictor computes the best probabilistic fit among the three GMMs. Thus a system is produced which is less prone to noise,thereby helping in predicting the nature of the disease accurately.
XV. CONCLUSION
In this paper, we have employed a Hybrid model of Adaboost technique and Gaussian Mixture model along with Expectation Maximization algorithm to overcome the problem of noisy data set, i.e. which resolves the problem of outliers and thereby lending a hand in predicting the stages in the renal disorder, thus making a best fit for the data. This supports the physicians in diagnosing and making a sound decision. References
[13]
Amandeep Kaur, J K Sharma and Sunil Agrawal, Optimization of Artificial Neural Networks for Cancer Detection, IJCSNS International Journal of Computer 112 Science and Network Security, VOL.11 No.5, May 2011.
[2] Shigang Chen, Xiaohu Ma, Shukui Zhang, AdaBoost Face Detection Based on Haar-like Intensity Features and Multithreshold Features, 2011 International Conference on Multimedia and Signal Processing. [3] Yufeng Li, Haijuan Zhang, Yanan Zhang, Research on Face Detection Algorithm in Instant Message Robot, 2011 International Conference on Uncertainty Reasoning and Knowledge Engineering. [4] Amit P Ganatra, Yogesh P Kosta, Comprehensive Evolution and Evaluation of Boosting, International Journal of Computer Theory and Engineering, Vol.2, No.6, December, 2010.
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) [5] Chensheng Sun, Jiwei Hu, Kin-Man Lam, Feature Subset Selection For Efficient Adaboost Training in IEEE 2011. [6] JareeThongkam, GuandongXu and Yanchun Zhang, AdaBoost Algorithm with Random Forests for Predicting Breast Cancer Survivability,2008 International Joint Conference on Neural Networks (IJCNN 2008). [7] KaushikNagarajan, KulandaiAnandBatlagunduRajagopalan, Recognition using Glottal Waveform. Emotion Neural Networks, European Journal of Scientific Research, ISSN 1450-216X Vol.61 No.4 (2011). [9] M S SSai, P.Thrimurthy, Dr.S.Purushothaman ,Implementation of Back-Propagation Algorithm For Renal Datamining, International Journal of Computer Science and Security, Volume 2, Issue 2. [10] Wikipedia. (2011, June) Expectationmaximization algorithm.[Online]. http://en.wikipedia.org/wiki/Expectation_maximizati on_algorithm [11] A.P. Dempster, N.M. Laird, and D.B. Rubin. Maximimum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society B, 39(1):1{38, 1977}.
[8] Adenike O. Osofisan, Omowumi O. Adeyemo, Babatunde A. Sawyerr, OluwafemiEweje, Prediction of Kidney Failure Using Artificial
Glucose Level Detection in Exhaled Breath Condensate Using MEMS Sensor

Jasmine.J.Nainita1, S.Praveen Kumar2 PG Student , Saveetha Engineering College 2 Senior Assistant Professor Department of Electronics & Communication Saveetha Engineering College
1
Abstract
Glucose Level detection in Exhaled breath which is a unique bodily fluid that can be utilized, while most applications will detect substances or diseases in the breath as a gas or aerosol, breath can also be analyzed in the liquid phase as exhaled breath condensate (EBC). As a diagnostic, exhaled breath offers advantages since samples can be collected and tested with results delivered in real time at the point of testing. Exhaled breath can be used to detect various drugs, medications, their metabolites and markers, and this can be valuable in measuring both medication adherence and in determining the blood levels of these drugs and medications. MEMS sensors are used in order to detect the levels of Glucose and pH in the Exhaled Breath Condensate. Keywords: Glucose, Exhaled breath, MEMS, aerosol, markers. INTRODUCTION: THERE is significant interest in developing rapid diagnostic approaches and improved sensors for determining early signs of medical problems in humans. Exhaled breath is a unique bodily fluid that can be utilized in this regard [1][12].While most applications will detect substances or diseases in the breath as a gas or aerosol, breath can also be analyzed in the liquid phase as exhaled breath condensate (EBC). Analytes contained in the breath originating from deep within the lungs (alveolar gas) equilibrates with the blood, and, therefore, the concentration of molecules present in the breath is closely correlated with those found in the blood at any given time [1][6].EBC contains dozens of differentbiomarkers,such adenosine,ammonia,hydrogenperoxide,isoprost anes, leukotrienes,peptide,cytokine and nitrogen oxide [8], [9], [12]. Analysis of molecules in EBC is non-invasive and can provide a window on the metabolic state of the human body, including certain signs of cancer, respiratory diseases, liver, and kidney functions. As a diagnostic, exhaled breath offers advantages since samples can be collected and tested with results delivered in real time at the point of testing. Another advantage is that the sample can be collected noninvasively by asking a patient to blow into the disposable portion of a handheld testing device. Therefore, the sample collection method is hygienic for both the patient and the laboratory personnel. Exhaled breath can be used to detect various drugs, medications, their metabolites and markers, and this can be valuable in measuring both medication adherence and in determining the blood levels of these drugs and medications. Some of todays blood and urinebased tests might be replaced with simple breath-based testing. In consumer healthcare, diabetics would be able to test their glucose level, replacing painful and inconvenient finger-prick devices. For roadside screening of driving impairment, a point-of-care (POC) device similar in function to a handheld breath alcohol analyzer will detect drugs of abuse such as marijuana and cocaine. In workplace drug testing, a similar desktop device might eliminate the cost, embarrassment and inconvenience of workplace urine screening. In the setting of chronic oral drug therapy (e.g. ,treatment of schizophrenia with atypical antipsychotic medications), mortality/morbidity, and the cost of health care
will be reduced experimental setup.
markedly
with
the
PRINCIPLE AND DESIGN:
frequency, phase shift, damping ratio or quality factor (Q), directly depend on damping. Measurement of these characteristics allows the determination of the glucose concentration. An important feature of our design is that the cantilever operates at relatively low frequencies (lower than 2 kHz in the solution) as appropriate for the polymer solutions relaxation dynamics. FABRICATION AND TESTING:
Fig.1.Schematic of MEMS Glucose Sensor As shown in Fig. 1, a semi-permeable membrane separates a vibrational micro cantilever from glucose solution. The membrane forms the ceiling of a micro chamber, which encloses the cantilever. The chamber is filled with a solution consisting of Dextran and Con-A. Glucose penetrates through the membrane, while neither Con-A nor Dextran can escape through the membrane. The Dextran is crosslinked by Con-A and forms a gel-like structure, resulting in a highly viscous solution. As free glucose molecules enter the solution, they competitively bind with Con-A and weaken the Dextran crosslinking, reducing the solutions viscosity and hence damping on the vibration of the vibrating cantilever. Thus, glucose concentrations can be determined by vibration measurements. Such measurements are accomplished with our MEMS sensor which is based on a vibrational cantilever. The cantilever is formed using SU8, which is coated with a thin layer of magnetic film so that it can be excited by a magnetic field. Magnetic field excitation has the advantage of remote actuation for an implanted sensor and generally generates larger driving force. As the cantilever vibrates under electromagnetic actuation in the solution, the viscosity change induced by the competitive affinity binding significantly changes the viscous damping on the vibrating cantilever. The vibration characteristics, such as the maximum vibration amplitude, resonant
Fig.2.Fabrication Process of the Prototype Device The fabrication process (Fig. 2) started with deposition and patterning of the first layer of SU-8 (thickness: 7.5 pm), which formed the cantilever (600 pm long and 500 pm wide). Then nickel film stripes were deposited on the free end of the cantilever as the magnetic actuation material with the dimension of the each stripe 500 x 5 0 ~ 0.75 pm. A second SU8 layer (50 pm thick) was formed to serve as a reinforcement layer on the nickel layer to prevent the beam from bending due to the stress mismatch. Then after a third layer of SU-8 (height 100pm) was patterned to form the chamber sidewalls, the cantilever was released by XcF2 etching. The micro chamber was formed by bonding a regenerated cellulose semi-permeable membrane with a cut off molecules weight of 10 kD. The chamber volume is around 1 x I x 0.4 mm = 0.4 pl taking the undercut in XeFz etching into account. The final step was to form the top sample cell by gluing a plastic disk and a glass cover using epoxy. The volume of the top sample cell was 50 times larger than that of the
device chamber, so that the equilibrium glucose concentration in the chamber approximately equalled that in the sample cell. To measure cantilever vibrations, an optical lever and a lock-in amplifier were used (Fig. 3). A laser beam was shed on the cantilever beam and reflected onto a photo detector. When the cantilever vibrated up and down the reflection angle of the laser beam was changed accordingly, resulting in changes in the spot position on the photo detector. This signal was transformed into an electrical signal and fed into a lock-in amplifier and an oscilloscope. The Lock-in amplifier effectively reduced the noise by narrowing down the frequency bandwidth. The oscilloscope was used to measure the transient step response.
Fig.5.Vibration amplitude response at 750 Hz and damping ratio as a function of glucose concentration The measurement results for the steady-state frequency response and step transient response for different glucose concentrations at equilibrium are shown in the below figure. It can be seen that as the glucose concentration increased from 0 to 25 mM, the vibration amplitude also increased, with the device changing from an over-damped to under-damped system. The various frequency dependencies have been shown below, which is the result taken from this paper.
Fig. 3. Measurement Setup RESULTS AND DISCUSSION: We first measured vibration response of the sensor in air, with resonance occurring around 4.19 kHz. The calculated damping ratio in air from the transient response was 0.01.
Fig.6.Frequency dependence of the vibration amplitude for different glucose concentrations Fig.4. Frequency response of the cantilever in air In order to evaluate absolute viscosity changes for different glucose concentrations, premixed samples were measured using a commercial capillary viscometer. REFERENCES: [1] T. Kullmann, I. Barta, B. Antus, M. Valyon, and I. Horvath, Environmental temperature and relative humidity influence exhaled breath condensate pH, Eur. Respir. J., vol. 31, no. 2, pp. 474475, Feb. 2008.
[2] I. Horvath, J. Hunt, and P. J. Barnes, Exhaled breath condensate: Methodological recommendations and unresolved questions, Eur.Respir. J., vol. 26, no. 9, pp. 523548, Sept. 2005. [3] K. Namjou, C. B. Roller, and P. J. McCann, The breathmeterA new laser device to analyze your health, IEEE Circuits Dev. Mag., vol. 22, no. 5, pp. 2228, Sep./Oct. 2006. [4] R. F. Machado, D. Laskowski, O. Deffenderfer, T. Burch, S. Zheng, P.J. Mazzone, T. Mekhail, C. Jennings, J. K. Stoller, J. Pyle, J. Duncan,R. A. Dweik, and S. C. Erzurum, Detection of lung cancer by sensor array analyses of exhaled breath, Amer. J. Respir. Crit. Care. Med., vol. 171, no. 11, pp. 12861291, June 2005. [5] T.Kullmann, I. Barta, Z. Lazar, B. Szili, E. Barat, M.Valyon, M.Kollai, and I. Horvath, Exhaled breath condensate pH standardised for CO partial pressure, Eur. Respir. J., vol. 29, no. 3, pp. 496501, Mar. 2007. [6] J. Vaughan, L. Ngamtrakulparit, T. N. Pajewski, R. Turner, T. A. Nguyen, A. Smith, P. Urban, S. Hom, B. Gaston, and J. Hunt, Exhaled breath condensate pH is a robust and reproducible assay of airway acidity, Eur. Respir. J., vol. 22, no. 12, pp. 889894, Dec. 2003. [7] K. Kostikas, G. Papatheodorou, K. Ganas, K. Psathakis, P. Panagou, and S. Loukides, pH in expired breath condensate of patients with inflammatory airway diseases, Amer. J. Respir. Crit. Care. Med., vol. 165, no. 10, pp. 13641370, May 2002. [8] G. E. Carpagnano, M. P. Foschino Barbaro, O. Resta, E. Gramiccioni, N. V. Valerio, P. Bracciale, and G. Valerio, Exhaled markers in the monitoring of airways inflammation and its response to steroids treatment in mild persistent asthma, Eur. J. Pharmacol., vol. 519, no. 1/2, pp. 175181, Sep. 2005. [9] C. Gessner, S. Hammerschmidt, H. Kuhn, H.-J. Seyfarth, U. Sack, L. Engelmann, J. Schauer, and H. Wirtz, Exhaled breath condensate acidification in acute lung injury, Respir. Med., vol. 97, no. 11, pp. 11881194, Nov. 2003.
[10] R. Accordino, A. Visentin, A. Bordin, S. Ferrazzoni, E. Marian, F. Rizzato, C. Canova, R. Venturini, and P. Maestrelli, Long-term repeatability of exhaled breath condensate pH in asthma, Resp. Med., vol.102, no. 3, pp. 377381, Mar. 2008. [11] K. Czebe, I. Barta, B. Antus, M. Valyon, I. Horvth, and T. Kullmann, Influence of condensing equipment and temperature on exhaled breath condensate pH, total protein and leukotriene concentrations, Resp. Med., vol. 102, no. 5, pp. 720725, May 2008. [12] K. Bloemen, G. Lissens, K. Desager, and G. Schoeters, Determinants of variability of protein content, volume and pH of exhaled breath condensate, Resp. Med., vol. 101, no. 6, pp. 13311337, Jun. 2007.
Tracing A Mobile Object In Wireless Networks (MANET)

Partheeban.G, Vignesh.M, Balakannan S.P Final B.Tech, Department of IT, Anand Institute of Higher Technology, Chennai, India a Assistant Professor, Department of IT, Anand Institute of Higher Technology, Chennai, India
Abstract
In order to maintain the resource consumption in the MANET there are lot of obstacles in it. To balance the resource consumption among all nodes and prolong the lifetime of an MANET, nodes with the most remaining resources should be elected as the leaders. However, there are two main obstacles in achieving this goal. First, without incentives for serving others, a node might behave selfishly by lying about its remaining resources and avoiding being elected. Second, electing an optimal collection of leaders to minimize the overall resource consumption may incur a prohibitive performance overhead, if such an election requires flooding the network. To address the issue of selfish nodes, we present a solution based on mechanism design theory. More specifically, the solution provides nodes with incentives in the form of reputations to encourage nodes in honestly participating in the election process. The amount of incentives is based on the Vickrey, Clarke, and Groves (VCG) model to ensure truth-telling to be the dominant strategy for any node. To address the optimal election issue, we propose a series of local election algorithms that can lead to globally optimal election results with a low cost. overhead, if such an election requires flooding the network. To address the issue of selfish nodes, we present a solution based on mechanism design theory. I. INTRODUCTION More specifically, the solution provides nodes with incentives in the form of reputations to encourage nodes AIM: in honestly participating in the election process. The The main aim of this project is to design a amount of incentives is based on the Vickrey, Clarke, mechanism to elect leader in the presence of Selfish and Groves (VCG) model to ensure truth-telling to be nodes for intrusion detection In MANETs (Mobile Ad the dominant strategy for any node. To address the hoc networks) using Cluster-Dependent Leader Election optimal election issue, we propose a series of local (CDLE). election algorithms that can lead to globally optimal election results with a low cost. We address these issues SCOPE OF THE PROJECT: in two possible application settings, namely, ClusterIn order to maintain the resource consumption in Dependent Leader Election (CDLE) and Clusterthe MANET there are lot of obstacles in it. First, without Independent Leader Election (CILE). The former incentives for serving others, a node might behave assumes given clusters of nodes, whereas the latter does selfishly by lying about its remaining resources and not require any preclustering. Finally, we justify the avoiding being elected. Second, electing an optimal effectiveness of the proposed schemes through extensive collection of leaders to minimize the overall resource experiments. consumption may incur a prohibitive performance overhead, if such an election requires flooding the II. SYSTEM ANALYSIS network. So we need to present a mechanism design theory on the Leader Election. EXISTING SYSTEM [1] In this paper, we study leader election in the The Mobile Ad hoc Networks (MANETs) have presence of selfish nodes for intrusion detection in no fixed chokepoints/bottlenecks where Intrusion mobile ad hoc networks (MANETs). To balance the Detection Systems (IDSs) can be deployed. In case of resource consumption among all nodes and prolong the other networks there are bottle necks available. For lifetime of an MANET, nodes with the most remaining example in sensor networks there are sinks available resources should be elected as the leaders. However, where the intrusion detection system can be introduced. there are two main obstacles in achieving this goal. First, Mobile Nodes are inefficient of energy resource without incentives for serving others, a node might consumption because they are energy limited. That is in behave selfishly by lying about its remaining resources and avoiding being elected. Second, electing an optimal collection of leaders to minimize the overall resource consumption may incur a prohibitive performance Organized by: Department of Computer Science and Engineering, Anand Institute of Higher Technology, Chennai. www.iconic12.in E-mail: iconic.aiht@gmail.com [Type text] Page 338
case of mobile networks or wireless networks there are many nodes involved which run on battery and they need energy to run. Thus there is a resource constraint in these networks which has to be managed properly. In order to balance the resource consumption of IDSs among nodes, this it is difficult to achieve since the resource level is the private information of a node. This is the also the main disadvantage of the system. More over the nodes dont have to share the information since it is not something which is compulsory.
III.
SYSTEM IMPLEMENTATION
LIMITATIONS OF EXISTING SYSTEM
There are no balanced resource consumptions among all nodes. This is because the data may travel through any node any number of times this using the resources of one particular node which makes that node to become unusable sometimes due to the power drain. Nodes behave in a selfish manner by reveling false resource level of it to others to avoid being elected as leader. This is because when a node becomes a leader it has to take care of other works also.
PROPOSED SYSTEM
In our proposed system resource consumption is balanced among IDSs among all nodes while preventing nodes from behaving selfishly by the use of incentives. A leader election algorithm is devised to handle the election process, Balance the IDS resource consumptions among all nodes by electing the most costefficient leaders. This is done by the use of leader election algorithm. Motivate selfish nodes to reveal their truthful resources level. This is done by giving the nodes some incentives for revealing their truthful information. Thus the nodes would give their truthful information for the incentives that they get.
ADVANTAGES OF PROPOSED SYSTEM
There are balanced resource consumptions among all nodes. Since the leader is used the resources can be made to be balanced among all the nodes throughout the network. Fair leader election process is carried out. That is the leader is elected fairly based on the information like battery power, memory and mobility of the nodes. Packets are easily transmitted to destination without any intrusions with the help of leader. When an intrusion takes place the intruding node is removed from the network after running a global detection process. If there are no nodes available to transmit the message to the destination node then the leader takes care of it by sending the message to the destination directly.
System implementation is the process of converting a new or a revised system design into an operational one. Thus it can be considered to be the most critical stage in achieving a successful new system and in giving the user, confidence that the new system will work and be effective. The implementation stage involves careful planning, investigation of the existing system and its constraints on implementation, designing of methods to achieve changeover and evaluation of changeover methods. Implementation is the process of converting a new system design into operation. It is the phase that focuses on user training, site preparation and file conversion for installing a candidate system. The important factor that should be considered here is that the conversion should not disrupt the functioning of the organization. Leader election mechanism is a feature of the implementation of this project. A Leader Node is elected in the presence of the selfish nodes by revealing its cost and information. A hierarchical structuring of relations may result in more classes and a more complicated structure to implement. Therefore we are transforming the hierarchical relation structure to a simpler structure such as a classical flat one. It is rather straightforward to transform the developed hierarchical model into a bipartite, flat model, consisting of classes on the one hand and flat relations on the other. Flat relations are preferred at the design level for reasons of simplicity and implementation ease. There is no identity or functionality associated with a flat relation. A flat relation corresponds with the relation concept of entity-relationship modeling and many object oriented methods.
MODULES
a) Leader Election Mechanism b) Cluster-Dependent Leader Election (CDLE) c) Intrusion Detection Systems The Leader Election Mechanism Truthfully elects the leader by revealing the cost of analysis. In the Cluster Dependent Leader Election the whole network is divided into a set of clusters where a set of 1-hop neighbor nodes forms a cluster. In the Proposed system every node participates in running its IDS in order to collect and identify possible intrusions.
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) nodes analysis cost.
IV.
ARCHITECTURE
An elected leader in each cluster is responsible for detecting in its Cluster group then the misbehaving nodes are punished by gradually excluding them from communication services. c) Intrusion Detection Systems: In the Proposed system every node participates in running its IDS in order to collect and identify possible intrusions. If an anomaly is detected with weak evidence, then a global detection process is initiated for further investigation about the intrusion through a secure channel. An elected leader is responsible for detecting in its Cluster group then the misbehaving nodes are punished by gradually excluding them from communication services. The reputation is calculated based on data monitored by local nodes and information provided by other nodes involved in each operation.
Architecture diagram
MODULE DESCRIPTION a)Leader Election Mechanism:

The Leader Election Mechanism Truthfully elects the leader by revealing the cost of analysis. The main goal of using mechanism design is to address the problems by: 1)Designing incentives for nodes to provide truthful information about their preferences 2) Computing the optimal systemwide solution, this is defined according to (1). A leader is elected in each cluster using the information shared by the nodes. Each leader handles the monitoring process based on nodes analysis cost. An elected leader is responsible for detecting in its Cluster group then the misbehaving nodes are punished by
V.CONCLUSION
The unbalanced resource consumption of IDSs in MANET and the presence of selfish nodes have motivated us to propose an integrated solution for prolonging the lifetime of mobile nodes and for preventing over different outcomes. the emergence of selfish nodes. The solution motivated nodes to truthfully elect the most costefficient nodes that handle the detection duty on behalf of others. Moreover, the sum of the elected leaders is globally optimal. To achieve this goal, incentives are given in the form of reputations to motivate nodes in revealing truthfully their costs of analysis. Reputations are computed using the well-known VCG mechanism by which truth telling is the them dominant strategy. gradually excluding from communication services. We also analyzed the performance of the mechanisms in the presence of selfish and malicious nodes. To implement our mechanism, we devised an election algorithm with reasonable performance over heads. We also provided the algorithmic correctness and security properties of our algorithm. We addressed these issues into two applications: CILE and CDLE. The former does not require any
b) Cluster-Dependent Leader Election (CDLE):

In the Cluster Dependent Leader Election the whole network is divided into a set of clusters where a set of 1-hop neighbor nodes forms a cluster. Each cluster then independently elects a leader among all the nodes to handle the monitoring process based on
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) pre clustering, whereas CDLE requires nodes to be clustered before running the election mechanism. The results showed that our model is able to prolong the lifetime and balance the overall resource consumptions among all the nodes in the network. Moreover, we are able to decrease the percentage of leaders, single-node clusters, and maximum cluster size, and increase the average cluster size. These properties allow us to improve the detection service through distributing the sampling budget over less number of nodes and reduce single nodes to launch their IDS. for Wireless Ad Hoc Networks. John Wiley and Sons, Inc., 2007. [5] S. Basagni, Distributed and Mobility-Adaptive Clustering for Multimedia Support in Multi-Hop Wireless [6] S. Basagni, Distributed Clustering for Ad Hoc Networks, Proc.IEEE Intl Symp. Parallel Architectures, Algorithms, and Networks (ISPAN),
FUTURE ENHANCEMENT
Techniques can be enhanced to improve QoS, mitigate interference, reduce hotspot effects, and design next-generation monitoring and intrusion detection system. There are also several other possible directions for future. May be from the selected leaders from each cluster, a leader can take responsible for the whole network and reduce the resource consumption little more and also helps in detecting intrusions more quickly and in an efficient manner.
VI.
REFERENCES
[1] Noman Mohammed, Hadi Otrok, Lingyu Wang, Mourad Debbabi, and Prabir Bhattacharya, Fellow, Mechanism Design-Based Secure Leader Election Model for Intrusion Detection in MANET JanuaryFebruary 2011 [2] T. Anantvalee and J. Wu, A Survey on Intrusion Detectionin Mobile Ad Hoc Networks, Wireless/Mobile Network Security, Springer, 2006. [3] L. Anderegg and S. Eidenbenz, Ad HocVCG: A Truthful and Cost-Efficient Routing Protocol for Mobile Ad Hoc Networks with Selfish Agents, Proc. ACM MobiCom, 2003. [4] F. Anjum and P. Mouchtaris, Security
[7] P. Brutch and C. Ko, Challenges in Intrusion Detection for Wireless Ad-Hoc Networks, Proc. IEEE Symp. Applications and [8] S. Buchegger and J.L. Boudec, Performance Analysis of the CONFIDANT Protocol (Cooperation of NodesFairness in Dynamic Ad-Hoc Networks), Proc. ACM MOBIHOC, 2002. [9] K. Chen and K. Nahrstedt, iPass: An IncentiveCompatible AuctionScheme to Enable Packet Forwarding Service in MANET, Proc. Intl Conf.Distributed Computing Systems, 2004. [10] B. DeCleene, L. Dondeti, S. Griffin, T. Hardjono, D. Kiwior, J.Kurose, D. Towsley, S. Vasudevan, and C. Zhang, Secure Group Communications for Wireless Networks, [11] J. Feigenbaum, C. Papadimitriou, R. Sami, and S. Shenker, A BGP Based Mechanism for Lowest-Cost Routing, Proc. ACM Symp. Principles of Distributed Computing (PODC), 2002. [12] J. Feigenbaum and S. Shenker, Distributed Algorithmic Mechanism Design: Recent Results and Future Directions, Proc. AMM Intl Workshop Discrete Algorithms and Methods for Mobile Computing and Comm. (DIALM), 2002. [13] N. Gura, A. Patel, A. Wander, H. Eberle, and S.C. Shantz, Comparing Elliptic Curve Cryptography and RSA on 8-Bit CPUs, Proc. Workshop Cryptographic Hardware and Embedded Systems (CHES), 2004. [14] S. Gwalani, K. Srinivasan, G. Vigna, E.M. BedingRoyer, and R.Kemmerer, An Intrusion
the
Inter
Com
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) Detection Tool for AODV-Based Ad Hoc Wireless Networks, Proc. IEEE Computer Security Applications Conf. (CSAC), . [15] Y. Hu, A. Perrig, and D.B. Johnson, Ariadne: A Secure On-Demand Routing Protocol for Ad Hoc Networks, Proc. ACM MOBICOM, 2002.
Leaf cutter ant model for web service selection and execution
1,2,3,4
K. Sendil Kumar 1, K.Sathishkumar 2, P.Ravisasthiri 3, R.Sridharan 4 Department of Information Technology,Sri Manakula Vinayagar Engineering College, Madagadipet, Pondicherry, India
Abstract
One of the important issue arising from the web service is how to improve the performance and automation of service selection and execution for efficient retrieval services. The main goal of our proposed system is that with the help of keywords it is used to retrieve the number of services and then store it into the database. From the database, user can select the service and then our model can provide the input for the execution of services. This process model is based on the behavior of leaf cutter ant. Leaf cutter ant having the behavior of finding, selecting, executing and saving. We use these behaviors in web services for improving the automation and response time in the user side. as WSDL, the Web Service Description Language. WSDL provides a format for description of a Web service interface, including parameters, data types and options, in sufficient detail for a programmer to write a client application for that service. That description may be added to a searchable registry of Web services. A proposed standard for this purpose is UDDI (Universal Description, Discovery and Integration), described as a large central registry for businesses and services. Web services are often seen as having the potential to 'flatten the playing field', and simplify business-to-business operations between geographically diverse entities[1]. Web Services is a relatively young set of interconnected technologies which are meant to enhance developers abilities to program applications for the internet. They provide a framework for communication between such applications by utilizing the concept of services. Services are offered between applications via simple protocols and formats such as HTTP, XML (eXtensible Markup Language) and SOAP (Simple Object Access Protocol), and the applications do not need to be aware of the implementation of the services they wish to invoke, but simply their existence and their interfaces. Individual Web Services are platform independent and also independent of each other. The XML is a standard which makes it possible to automatically manipulate and process data, and where separation of content and presentation is one of the core characteristics[2]. SOAP in turn allows
Keywords---- Service discovery; Leaf cutter ant model; UDDI registry; data store; web services; 1 INTRODUCTION Web Services is a computing technique for systematically disseminating XML content, usually over a network. In its simplest form, one computer sends another computer a request for information in the shape of an HTTP request or an XML stream. The idea of Internet-accessible programmatic interfaces, services intended to be used by other software rather than as an end product, is not new. Web services are a development of this idea. The name refers to a set of standards and essential specifications that simplify the creation and use of such service interfaces, thus addressing interoperability issues and promoting ease of use. Well-specified services are simple to integrate into larger applications, and once published, can be used and reused very effectively and quickly in many different scenarios. They may even be aggregated, grouped together to produce sophisticated functionality. 'Web services' refers to a potentially huge collection of available standards, so only a brief overview is possible here. The exchange of XML data uses a protocol such as SOAP or XML-RPC. Once published, the functionality of the Web service may be documented using one of a number of emerging standards, such
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) remote method invocation through the use of XML, providing flexibility and robustness. 2 RELATED WORKS The aim is role of SIA is to enable the ubiquitous integration of real world services running on embedded devices with enterprise services. WS-* web service standards constitute the de facto communication method used by the components of enterprise-level applications, and for this reason SIA is fully based on them. In this manner, business applications can access near real-time data from a wide range of networked devices through a high-level, abstract interface based on web services[3]. The process of finding out real world services running on physical devices can be done as follows: 1. WS-Discovery on which DPWS is based (both active and passive). 2. RESTful active network discovery, where a device notifies its presence to the LDU automatically. 3. Passive RESTful discovery for RESTenabled devices that do not comply with SIA network discovery. After describing the way devices and their services are advertised, this Section describes the SDPP and its underlying steps. The process begins with a Types Query after the network discovery of devices has been executed. In this sub process, the developer uses keywords to search for services as she would search for documents on any search engine. Subsequently, this query is extended with related keywords fetched from different websites, and used to retrieve types of services that describe the functionality, but not yet the real-world device it runs on. This is the task of the Candidate Search, where the running instances of the service type are retrieved and ranked according to context parameters provided by the developer. In case no service instance has been found, the process goes on with Provisioning. It begins with a forced network discovery of devices, where the devices known to provide the service type the developer is looking for, are asked to acknowledge their presence. If no suitable device is discovered, a service injection can be requested. In this last step the system tries to find suitable devices that could run the requested service, and installs it remotely. A prototype of the described process was implemented and integrated to the SOCRADES Integration Architecture. The prototype implementation was developed in Java and deployed on a Java Enterprise Application Server (SAP NetWeaver) at two distinct locations. The evaluation of the prototype was split in three parts following the subparts of the Real-World Service Discovery Process (i.e., Types Query, Candidate Search, and Provisioning)[1]. The first step in evaluating the implementation of the process was to get a number of DPWS-enabled devices offering services to search for. Unfortunately, since it is only recently that DPWS has become an official standard (WS-DD), its adoption on industrial devices is ongoing. Thus, we decided to simulate a larger number of devices that one could expect to find in future industrial environments. Since developers usually write the description of new web services, we selected 17 experienced developers and asked them to write the description of a selected device and of at least two services it could offer. The developers were given the documentation of a concrete device according to the projects they were currently working on. Based on these descriptions we generated 30 types of services (described in WSDL containing DPWS metadata) for 16 different smart devices ranging from RFID readers to robots and sensor boards. Out of these, 1,000 service instances were simulated at the two deployed locations. This work further extended the simulation by implementing them on a prototyped shop floor in a laboratory and industrial setup in two different scenarios described[3]. The locations spanned across cities and countries using Internet as the communication backbone for these services and internal operations. Common shop floor devices like temperature and vibration sensors were identified. SunSPOT sensors, 2 gantry robots, PLC (Programmable Logic Controller) devices controlling conveyor belts and proximity
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) sensors from leading industrial vendors were wrapped with (DPWS) web services or directly deployed service instances on the devices themselves. We further tested the integration of RESTful devices by implementing a native web server for the SunSPOTs. The prototype of the SOCRADES Integration Architecture was used in the back end to monitor, search and compose services offered by these devices. 3 PROPOSED FRAMEWORK We are going to develop the model for service user in the web service. Our model provides easier access to get the services from the UDDI with the help of the keywords given by the user. The retrieved services are then stored in the database. With the help of our model, user can select the services and then our application automatically sends the request to the service provider and gets the response. The following are features of our proposed framework. To improving the automation. 2.Application finds the number of service in the UDDI registry according to the required services. 3.After getting the number of services its store into the database for prevent the any loss of services due to failures. 4.User performs the process of selection of web services is listed. 5.The application is hides the systems complexity from the clients. 6.Improving the QoS (Quality of services). 7.Based on the received requirements specification, it discovers functionally similar web services from the UDDI registry. 8.It helps the user to find the service that best suits the user interest. 9.After getting the requirements specification from the service user, the application will send the request to the service provider. 3.1 Leaf cutter ants behaviors 1. FIND 2. SELECT 3. EXECUTE 4. STORE Using this behavior of leaf cutter ant can implemented in the our model is shown in the Fig 1.
3.2 Keyword search Find the available services in the UDDI with the help of the keyword given by the user. The user also provides the input and output parameters for retrieves best service. The services are retrieved in the form of rank based. 3.3 Selections The user can select needed services in the list of services retrieved by our algorithm from the UDDI. The services are listed in the form of table view its contains the services , service type and description fields. This is help to provide the best services to the user as required. 3.4 Search invocation After the selection process our application automatically send the request to service provider in the form of XML document and also get the response from the service provider in the form of XML document. 3.5 Data store Store the retrieved services in the database and also the results of the service request .Its can help the user can access the service in fast manner. This help the user can need the same service in any other time its retrieves the service from the local database no need to go to the UDDI. 4 CONCLUSION In this paper, we proposed a new model for web services selection and execution. Due to increase the automation in the web services. the leaf cutter ant model can retrieves the
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) services based on ranking to get the best quality of services. Our model can reduce the response time taken for process. In our model can maintain the service in the data store after retrieves from the UDDI. References [1] E. Fleisch and F. Mattern, Das Internet der Dinge: Ubiquitous Computing und RFID in der Praxis:Visionen, Technologien, Anwendungen, Handlungsanleitungen. Springer-Verlag, 2005. [2] D. Lizcano, M. Jimenez, J. Soriano, J.M. Cantera, M. Reyes, J.J.Hierro, F. Garijo, and N. Tsouroulas, Leveraging the Upcoming Internet of Services Through an Open UserService Front-End Framework, Proc. First European Conf. Towards a Service-Based Internet (ServiceWave 08), pp. 147-158, 2008. [3] Interacting with the SOA-Based Internet of Things: Discovery, Query, Selection, and On-Demand Provisioning of Web Services. IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. 3, NO. 3, JULY-SEPTEMBER 2010, Dominique Guinard, Student Member, IEEE, Vlad Trifa, Student Member, IEEE, Stamatis Karnouskos, Senior Member, IEEE, Patrik Spiess, Member, IEEE, and Domnic Savio, Member, I
Deployment of P-Cycle for all Optical Mesh Networks under Static and Dynamic Traffic
1
V. Pradeepa 1 , Asst.Prof,Department of ECE,RVS College of Engineering and Technology, Dindigul-5
Abstract
The major challenge in survivable mesh network is the design of resource allocation algorithm that allocate network resources efficiently while at the same time are able to recover from a failure quickly. This issue is particularly more challenging in optical networks operating under wavelength continuity constraints, where the same wavelength must be assigned on all links in the selected path. In this work, A Wavelength Routed WDM All optical Network is developed. We consider a Apfloyd's Algorithm for the deployment of P-Cycles in WDM Mesh Network with and without Wavelength conversion. Static Traffic P-Cycles can be formed jointly. Keywords Optical Networks, p-cycles, protection and restoration by allocating redundant capacity, or protection capacity, is preallocated for use when a link fails is called protection In Optical wavelength division multiplexing (WDM) networks, a failure of a network component can lead to a severe disruption in the traffic. Therefore, protection and restoration are imperative in the design of WDM networks. The method of preconfigured protection cycle (p-cycle), proposed by Grover. P-Cycles can achieve fast protection speed and mesh like high efficiency of spare capacity. This is because a p-cycle can provide protection not only for on-cycle span but also for straddling span. The major advantages of P-Cycles protection schemes over the diverse routing protection schemes are their ability to achieve both good resource efficiency and fast restoration times simultaneously. It is possible to achieve fast restoration time because the only real time switching required upon link failure is between the end nodes of the failed link. Moreover, P-Cycles can reach good resource redundancy compatible to that of conventional survivable schemes used in mesh networks. In WDM wavelength-routed optical mesh networks, P-Cycle techniques can be applied to ensure survivability against span failures (fiber cuts) under static and dynamic traffic environments. The major challenge in survivable networks is the design of resource allocation algorithms that allocate network resources efficiently while at the same time being able to quickly recover from failure by rerouting the broken connection using the reserved spare capacity. This issue is particularly more
INTRODUCTION
Many Emerging networking applications, such as data browsing in the world wide web, video conferencing, video on demand, Ecommerce, and image distributing, require very high network bandwidth, often far beyond what todays high-speed networks can offer. Optical networking is a promising solution to this problem, because of the nearly unlimited bandwidth of optics. To fully use the bandwidth, a fiber is divided into a number of independent channels, with each channel on a different wavelength. This is referred to as wavelength-division multiplexing (WDM)[2]. As wavelength routing paves the way for network throughputs of possibly hundreds of Tb/s, network survivability assumes critical importance. A short network outage can lead to data losses of order of several gigabits. Hence, protections spare resources in anticipation of faults and rapid restoration of traffic upon detection of a fault are becoming increasingly important. Survivability is the ability of the network to withstand equipment and link failures. The main goals of survivable network design are to be able to perform rapid restoration at as small a cost as possible. Node equipment failures are typically handled using redundant equipment within the node [3], on the other hand, link failures, which are by far the most common failures in optical network, occurs due to backhoe accidents and are typically dealt with
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) challenging in optical networks operating under the wavelength continuity constraint, where the same wavelength must be assigned on all links in the selected path. The prime objective of most survivable routing algorithms is to minimize the consumption of network resources and reduce the restoration time during a failure. However, in some survivable routing schemes, the objective is a tradeoffs between reducing network resources required and decreasing the restoration time after failures[3]. This paper is organized as follows,In section II,we describe proposed algorithm for deploying P-Cycle.The systematic study provides three step approach where first,the demand connections are routed through the networks and form the P-Cycles.In section III, given explanation about mesh network. Finally, apply the ApFloyds algorithm in to mesh network and deployed the P-Cycle. II. PROPOSED ALGORITHM FOR P-CYCLE DESIGN In general, network traffic is unlikely to be symmetric in both directions between two nodes. This means that the number of working and protection wavelengths is not likely to be the same in both directions. Like in [5], we will consider p-cycles in this paper without loss of generality. A p-cycle can protect one working unit in the opposite direction for every on-cycle span, and two working units (one in each direction) for every straddling span. The number of spare units of a p-cycle is equal to the number of spans on the cycle. We define the confidence of a p-cycle as the ratio of the number of working units that are actually protected by the p-cycle to the number of spare units of the p-cycle. A p-cycle with a greater confidence means that its spare units are utilized more efficiently than a p-cycle with a smaller confidence. The idea behind this Apfloyds is to identify those p-cycles that can actually protect as many working units as possible, and hence to reduce the total spare units. For a given mesh network topology and traffic demand, this confidence based p- cycle design algorithm is summarized as follows: Step 1: Find all possible candidate paths according to the All- Pair shortest path Algorithm and determine the cost. for each node v in network: dist[v] := infinity ; previous[v] := undefined ; end for ; dist[source] := 0 Q := the set of all nodes in network ; while Q is not empty: u := node in Q with smallest dist[] ; if dist[u] = infinity: break; fi ; remove u from Q ; for each neighbor v of u: alt := dist[u] + dist_between(u, v) ; if alt < dist[v]: dist[v] := alt ; previous[v] := u ; fi ; end for ; end while ; return dist[] ; Step 2: For each candidate paths, calculate the confidence of its p-cycle. GenerateCandidates(Dictionary<string, double> dic_FrequentItems) { Dictionary<string, double> dic_CandidatesReturn = new Dictionary<string, double>(); for (int i = 0; i < dic_FrequentItems.Count - 1; i++) { string strFirstItem = Alphabetize(dic_FrequentItems.Keys.Element At(i)); for (int j = i + 1; j < dic_FrequentItems.Count; j++) { string strSecondItem = Alphabetize(dic_FrequentItems.Keys.Element At(j)); string strGeneratedCandidate = GetCandidate(strFirstItem, strSecondItem);
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) if (strGeneratedCandidate string.Empty) { strGeneratedCandidate Alphabetize(strGeneratedCandidate); double dSupport GetSupport(strGeneratedCandidate); != = = {
dic_CandidatesReturn.Add(strGeneratedCandi date, dSupport); } } } return dic_CandidatesReturn; } if (strFirstSubString == strSecondSubString) { return strFirstItem + strSecondItem[nLength - 1]; } else return string.Empty; } } Step 3: Select a p-cycle with maximum confidence. If multiple p-cycles has the same maximum confidence, then randomly select one. private List<clssRules> GenerateRules() { List<clssRules> lstRulesReturn = new List<clssRules>(); foreach (string strItem in m_dicAllFrequentItems.Keys) { if (strItem.Length > 1) { int nMaxCombinationLength = strItem.Length / 2; GenerateCombination(strItem, nMaxCombinationLength, ref lstRulesReturn); } } return lstRulesReturn; } if (nItemLength == 2) { AddItem(strItem[0].ToString(), strItem, ref lstRulesReturn); return; } else if (nItemLength == 3)
for (int i = 0; i < nItemLength; i++) { AddItem(strItem[i].ToString(), strItem, ref lstRulesReturn); } return; } else { for (int i = 0; i < nItemLength; i++) { GetCombinationRecursive(strItem[i].ToString (), strItem, nCombinationLength, ref lstRulesReturn); } } } if (strCombination.Length == nCombinationLength) { if (cLastTokenCharacter != cLastItemCharacter) { strCombination = strCombination.Remove(nLastTokenCharcater Index, 1); cNextCharacter = strItem[nLastTokenCharcaterIndexInParent + 1]; string strNewToken = strCombination + cNextCharacter; return (GetCombinationRecursive(strNewToken, strItem, nCombinationLength, ref lstRulesReturn)); } else { return string.Empty; } } else { if (strCombination != cLastItemCharacter.ToString()) { cNextCharacter = strItem[nLastTokenCharcaterIndexInParent + 1];
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) string strNewToken = strCombination + cNextCharacter; return (GetCombinationRecursive(strNewToken, strItem, nCombinationLength, ref lstRulesReturn)); } else { return string.Empty; }}} III. MESH NETWORK A mesh network is a type of networking that uses redundant and distributed nodes to provide greater reliability and range for any given wired network. A mesh network is a local area network (LAN) that employs one of two connection arrangements for full mesh topology or partial mesh topology. In the full mesh topology, each node (workstation or other device) is connected directly to each of the others. In the partial mesh topology, some nodes are connected to all the others, but some of the nodes are connected only to those other nodes with which they exchange the most data. A mesh network is reliable and offers redundancy. If one node can no longer operate, all the rest can still communicate with each other, directly or through one or more intermediate nodes[3]. total capacity required. However, the numbers of possible p-cycle candidates grow exponentially with the average nodal degree and the number of nodes in the network. Furthermore, the number of variables used to represent flow due to each lightpath demand increase with the number of links and the number of wavelengths. As a result, the complexity of the problem grows very rapidly, which makes the solution time of the problem unacceptable. To reduce the complexity of the problem and to make it more tractable, the routing problem is first solved using path flow technique[1]. The idea is to generate the shortest routes for each s-d pair to be used as candidate routes in the optimization model. The candidate primary routes together with the candidate cycles are then used to formulate the overall problem as an Apfloyds Algorithm. To reduce the number of candidate p-cycles in the formulation, a novel pre-selection algorithm has been developed to select a reduced number of high merit cycles. Consequently, the final solution to the problem is a tradeoff between the optimality of the solution and the complexity of the problem [5]. All candidate path according to the algorithm derive in the following table, TABLE 1 DEMANDS IN THE MESH NETWORK S.N 1 2 3 4 5 6 Fig 2: A mesh network showing two cycles (Weighted and directed graph)
IV. JOINT SOLUTION
DEMANDS A->C->B A->C A->C->D B->A B->A->C B->A->C->D C->B->A C->B C->D D->A D->A->C->B D->A->C
7 8 9 10 11 12
For a more optimum solution, the problem can be solved jointly, where the candidate working routes and the candidate backup pcycles are jointly formulated as an Apfloyds Algorithm to minimize the
Consider the Table 1, consisting of 12 transactions with minimum support count required is 20% and let minimum confidence required is 80%. We have to first find out the frequent path using step2. Then, Association rules generated using minimum support and minimum confidence. TABLE 2 GENERATING 1- NODE FREQUENT PATH
TABLE 5 AS THE P-CYCLE OF MESH NEYWORK Cycle Number 1 2 Confidence Ratio 83% 85%
The set of frequent 1-Node, L1, consists of the candidate 1- Node satisfying minimum support. In the first iteration of the algorithm, each node is a member of the set of candidate. The support count is calculated by using Table1. To discover the set of frequent 2-Nodes, L2, the algorithm uses L2 join L1to generate a candidate set of 2-nodes, C2. Next, the transactions in D are scanned and the support count for each candidate node in C2 is accumulated (as shown in the middle table). The set of frequent 2-nodes, L2, is then determined, consisting of those candidate 2nodes in C2 having minimum support. TABLE 3 GENERATING 2- NODE FREQUENT PATH
There are two candidate cycles, each of which can be in either the clockwise (or) counter clockwise direction. As cycle 1 has 3 on cycle spans, one straddling span and its current confidence is 83.As cycle 2 has 3 spans and one stradding span and its confidence is 85. The current confidence for all cycles in Fig.1 are given in Table5.As the P-Cycle of cycle 2 has the maximum ratio. It will be selected.
V. CONCLUSION
We have considered a new proposed algorithm. We introduce an algorithm to provide single failures survivability using PCycles, for the construction of static survivable WDM networks without using ILP. This problem can be solved jointly. REFERENCES Survivability Approaches Using p-Cycles in WDM mesh networks under static traffic abdelhamid E.Eshoul and Hussein T.Mouftah, Fellow, IEEE/ACM Transactions on networking, Vol,17,No.2, April 2009. Zhenghao Zhang and Yuanyuan Yang, Senior Member, Performance Modeling of Bufferless WDM Packet Switching Networks With Limited-Range Wavelength ConversionIEEE Transactions on communication,vol.54,No.8,AUG 2006 Hongsik Choi,Suresh Subramaniam,HyeongAh Choi,Loopback Recovery From Double-Link Failures in Optical Mesh
The generation of the set of candidate 3-nodes. C3 involves use of the AP property. In order to find C3, we compute L2 join L2. Now, Join step is completed, and found the confidence of each candidate path. TABLE 4 GENERATING 2- NODE FREQUENT PATH
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) Networks,IEEE Transaction ON Networking,Vol.12.no.6,dec 2004. D.Schupke, C.Gruber, and A.Autenrieth, Optimal configuration of p-cycle in WDM networks, in Proc.IEEE Int. Conf.Communications (ICC2002), New York, Apr.2002, Vol.5,pp.2761-2765. Z.Zhenrong, Z.W.De, and B.Mukherjee, A heuristic method for design of survivable WDM networks with p-cycles, IEEE Commun. Lett., Vol.8, no.7, pp.7, 467-469, Jul.2004. W.D.Grover and D.Stamatelakis, Cycleoriented distributed pre-configuration: Ring-like speed with mesh-like capacity for self-planning network restoration, in Proc.IEEE Int. Conf. Communications (ICC1998), Atlanta, GA, Jun. 1998, Vol. 1, pp. 537-543. W.D. Grover and J.Doucette, Advances in optical networks design with p-cycles: Joint optimization and pre-selection of candidate p-cycles, in Proc.IEEE / LEOS, All Optical Networking Conf., Mont Tremblant, QC, Canada, Jul. 2002, pp. WA2-49-WA2-50. G.Shen and W.D. Grover, Extending the p-cycle concept to path segment protection for span and node failure recovery, IEEE J.Sel.Areas Communication vol.21, no.8, pp. 1306-1319, Oc
ESCALATING ENERGY FOR WIRELESS SENSORS

U.Prithivi rajan1, S.Praveen kumar2 Stuudent, Saveetha Engineering College, Chennai, Tamilnadu, India 2 Assistant Professor,Saveetha Engineering College,Chennai, Tamilnadu, India
1
Abstract Energy is an important issue for the development of human civilization, but the problem of exhaustion within decades of the principal fossil source of energy applied for energy consumption of almost whole world must be confronted. Design and simulation of MEMS based Electromagnetic device which capable of converting waste energy such as vibration to electrical energy. For most wireless applications, the ambient vibration is generally at low frequencies and traditional scavenging techniques cannot generate enough energy for proper operation. The reported generator up-converts low-frequency environmental vibrations to a higher frequency through a mechanical frequency up-converter using a magnet, and hence provides more efficient energy conversion at low frequencies. Power is generated by means of EM induction using a magnet and coils on the top of resonating cantilever. Index TermsArray of cantilevers, energy harvesting, energy scavenging, frequency upconversion (FupC), micro power generator. I INRODUCTION II THEORY OF ELECTROMAGNETIC GENERATORS Advances in integrated circuits manufacturing, low The theory of using electromagnetic generators for power ICs and MEMS technology have enabled many harvesting energy from vibration has been detailed. electronic devices to be more energy efficient. These low power devices include wireless sensors, Based on Faradays law of magnetic induction, an implantable medical devices and handheld electronic electromagnetic generator converts a mechanical devices. vibration into electrical power. The generator consists of a mechanical resonator and a coil. The mechanical The predominant power sources for these lowresonator amplifies the typically low amplitude of the power electronic devices are disposable and vibration source while the coil detects the deflection of rechargeable batteries. Main drawbacks of batterythe resonator and then transforms this deflection into powered systems include the following: 1.limited power electrical energy. supply, which limits the duration of operation; 2. Frequent battery maintenance and replacement, which III FREQUENCY UP CONVERSION are inconvenient or impossible for some applications TECHNIQUE such as implanted medical devices and remote The power output from vibration-based generators monitoring/tracking devices; 3. Adverse environmental is proportional to excitation frequency. For this reason, effects. Millions of batteries are thrown away each year, the main objective of the proposed design is to accept which are non-rechargeable and non-biodegradable. low-frequency vibrations as input and create highAn alternative source the vibration energy can be frequency vibrations to act as a linear motion harvested to power the low-power electronic devices transformer or literally to be a Frequency Convertor. such that batteries can be eliminated or the life batteries This is achieved using the basic mechanical vibration can be extended. [1] [2]. theory, which states that when underdamped structures are excited by an initial condition such as displacement There are three typical methods to convert or velocity, their response will be an exponentially mechanical vibrations into electrical energy, i.e., decaying out oscillatory motion. The proposed electrostatic [3] [4] [5] piezoelectric [6] [7] [8] and generator fig. 1, 2, 3, it is composed of two mechanical electromagnetic power harvesting [9] - [13]. structures: 1.the upper diaphragm and 2.the array of cantilevers located right below the diaphragm. The diaphragm is made of Parylene C and holds a NdFeB magnet for both frequency up-conversion and power
[Type text]
Page 353
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) generation by means of electromagnetic induction. The diaphragm magnet assembly resonates by vibrations in the range of 1-1000 Hz. The cantilevers are also made of Parylene C, they have a higher resonance frequency of 2-3 KHz, and each of them has a coil for induction. Also, at the tip of each cantilever, nickel is electroplated for interaction with the magnet. As the diaphragm resonates in response to external vibrations, it gets closer to the cantilever array. The distance between them is adjusted such that the magnet catches the cantilevers at a certain instance of its movement, pulls them up, and releases them at another point. The motion of the released cantilevers exponentially decays out, and before it completely dies, the cycle starts again. The released cantilevers start resonating at their damped natural frequency with the given initial condition, realizing the FupC. generator. I INRODUCTION Advances in integrated circuits manufacturing, low power ICs and MEMS technology have enabled many electronic devices to be more energy efficient. These low power devices include wireless sensors, implantable medical devices and handheld electronic devices. The predominant power sources for these lowpower electronic devices are disposable and rechargeable batteries. Main drawbacks of batterypowered systems include the following: 1.limited power supply, which limits the duration of operation; 2. Frequent battery maintenance and replacement, which are inconvenient or impossible for some applications such as implanted medical devices and remote monitoring/tracking devices; 3. Adverse environmental effects. Millions of batteries are thrown away each year, which are non-rechargeable and non-biodegradable. An alternative source the vibration energy can be harvested to power the low-power electronic devices such that batteries can be eliminated or the life batteries can be extended. [1] [2]. There are three typical methods to convert mechanical vibrations into electrical energy, i.e., electrostatic [3] [4] [5] piezoelectric [6] [7] [8] and electromagnetic power harvesting [9] - [13].
Fig 1 Proposed Design (Upper Left)
Fig 3 Proposed Design (Lower) Fig 2 Proposed Design (Upper Right) Index TermsArray of cantilevers, energy harvesting, energy scavenging, frequency upconversion (FupC), micro power The generator can effectively harvest energy from environmental vibrations of 70-150 KHz and generates 0.57mV voltage and 0.25nW power from a single cantilevers by converting the input vibration frequency 0f 95 Hz 2 KHz. The fabricated generator size is 8.5X7X2.5 mm3, and a total of 20 serially connected cantilevers have been used to multiply the generated voltage and power. The power and voltage levels can
[Type text]
Page 354
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) further be increased by increasing the number of cantilevers or coil turns. The performance of the generator is also compared with that of a same-sized custom-made traditional magnet-coil-type generator and with that of a traditional generator from the literature to prove its effectiveness. The size of the magnet is 3.8X3.8X1.5 mm3. The total number of cantilevers is 20. Each side five cantilevers are arranged. The natural frequency of cantilevers is 2 KHz. The size of the cantilevers is 1000X430X15 m3. The thickness of the nickel metal is 9m placed on the end of the cantilevers. The width of the coil is 20m placed on the above the cantilevers for generate the power. The number turns of the coil are 6. The magnet catches the cantilever due to external vibration and pulls them up, and releases them at another point. The release height of the cantilever is 200 m. The virtual fabrication was done with the help of Intelli-Fab tool, were the mask layers were drawn using intelli mask. The fabrication outline includes Deposition, Definition of mask, Etching with surface is cleaned after every deposition. In this we adopted bulk micromachining procedure and deposition type was planarization to increase the flatness of the surface in order to achieve the cantilever structure.Parylene C is used as the structural material for the cantilevers and the diaphragm because it allows much larger deflections before mechanic failure compared to silicon. First a thermal oxide of 200 nm thickness is grown on the silicon substrate. Next, a 1-m-thick parylene is deposited by chemical vapor deposition and patterned by reactive ion etching (RIE) at the contact pads and cantilever areas. Then, coil turns are formed by sputtering and patterning the first metal layer. As the next step, a second 1-m-thick parylene layer is formed on the metal and patterned to provide electrical isolation between the first and second metal layers, and vias are opened at the necessary positions on parylene by RIE to provide contact between the first and second metal layers. The second metal layer is then sputtered and patterned to complete the metal routes. The thickness of the cantilevers is defined mainly by the deposition of the third layer of 13-m-thick parylene and patterning. Afterward, a 9-m nickel (Ni) is deposited by electroplating and patterned by liftoff to form the magnetic actuation areas on the cantilevers. Then, a final layer of 1-m-thick parylene is deposited and patterned to act as a protection layer for the magnetic areas. Finally, silicon substrate is through etched from the backside by deep RIE (DRIE). The exposed oxide layer on the back is wet etched in a buffered hydrofluoric acid solution to release the devices. The devices are then cleaned in acetone and isopropyl alcohol. Then, the magnet is glued to the diaphragm, and the two chips are combined together to form a single device. V CONCLUSION The aim of the work is to show that proposed generator concept works in micro scale and is more efficient compared to a same-sized traditional micro generator operating under the same conditions. For this purpose, the device parameters have been optimized in a conservative manner for the verification of the concept. It has been shown the proposed generator performs much better than a custom made traditional generator. An electromagnetic micro energy generator that up-converts low-frequency environmental vibrations to a higher frequency has been presented. It is also possible to further improve the generated voltage and power by decreasing the coil width to increase the coil turns or by increasing the number of cantilevers. For example, in this design, a coil width of 20 m has been used and it can be decreased to 2 m to increase the generated outputs. Initial calculations show that this improvement leads to a 6.5 fold increase in voltage and power output levels. With further improvements in design parameters, it is possible to improve the performance of the proposed generator. [1] REFERENCES A.D.Joseph, A.Kansal, M.B.Srivastava, J.Randall, T.Buren, G.Troster, T.J.Johnson, W.W.Clark and F.Moll, Energy harvesting projects, Pervasive comput., vol.4, no.1, pp. 6971, Jan-Mar 2005. S.Roundy, E.S.Leland, J.Baker, E.Carleton, E.Reilly, E.Lai, B.Oits, J.M.Rabaey, P.K.Wright and V.Sundararajan, Improving power output for vibration based energy scavengers , Pervasive comput., vol.4, no.1, pp. 28-36, JanMar 2005. F.Peano and T.Tambosso, Design and optimization of a MEMS electrets-based capacitive energy scavenger, J.Microelectromech. syst., vol.14, no.3, pp.429435, Jun 2005.
[2]
[3]
[Type text]
Page 355
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) [4] [5] A.Nounou and H.F.Ragaie, A lateral comb-drive structure for energy scavenging, in Proc. ICEEC, 2004, pp.553-556. S.Meninger, J.O.Mur-Miranda, R.Amirtharajah, A.P.Chandrakasan and J.H.Lang, Vibration-toelectric energy conversion, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol.9, no.1, pp.64-76, Feb 2001. P.Glynne-Jones, S.P.Beeby and N.M.White, Towards a piezoelectric vibration-powered microgenerator, Proc. Inst. Elect. Eng.-Sci. Meas. Technol., vol.148, no.2, pp.68-72, Mar 2001. T.Sterken, K.Baert, C.Van Hoof, R.Puers, G.Borghs and P.Fiorini, Comparative modeling for vibration scavengers, in Proc, IEEE Sensors, 2004, vol.3, pp.1249-1252. N.S.Shenck, and J.A.Paradiso , Energy scavenging with shoe-mounted piezoelectrics, IEEE Micro, vol.21, no.3, pp30-42, May/Jun 2001. [9] P.B.Koenaman, I.J.Busch-Vishniac and K.L.Wood, Feasisbility of micro power supplies for MEMS, J.Micro Electromechan. Syst., vol.6, no.4, pp 355-362, Dec 1997. P.D.Mitcheson, T.C.Green, E.M.Yeatman and A.S.Holmes, Architectures for vibration-driven micropower generator, J.Micro Electromechan. Syst., vol 13, no.3, 2004. T.Von Buren, P.D.Mitcheson, T.C.Green, E.M.Yeatman, A.S.Holmes and G.Troster, Optimization of inertial micropower generator for human walking motion, IEEE Sensors J., no.99, pp.1-11, 2005. R.Amirtharajali, S.Meninger, J.Mur-Miranda, A.Chandrakasan and J.Lang, Self-powered signal processing using vibration-based power generation, IEEE J.Solid-State Circuits, vol.33, no.5, pp.687-695, 1998. C.B.Williams and R.B.Yates, Analysis of a micro-electric generator for microsystems, Sens. Actuators A, vol.52, pp.8-11,1996.
[10]
[6]
[11]
[7]
[12]
[8]
[13]
[Type text]
Page 356
Software Aging Analysis through Machine Learning

Mr. R..Raju 1, G. Sumathi2, J. Vidhya@shankardevi3 Associate professor, Sri Manakula Vinayagar Engineering College 2, 3 M.Tech (pursuing), Department Of CSE, Sri Manakula Vinayagar Engineering College
1
ABSTRACT Software aging is a phenomenon that refers to progressive performance degradation or transient failures or even crashes in long running software systems such as web servers. It mainly occurs due to the deterioration of operating system resource, fragmentation and numerical error accumulation. A primitive method to fight against software aging is software rejuvenation. Software rejuvenation is a proactive fault management technique aimed at cleaning up the system internal state to prevent the occurrence of more severe crash failures in the future. It involves occasionally stopping the running software, cleaning its internal state and restarting it. An optimized schedule for performing the software rejuvenation has to be derived in advance because a long running application could not be put down now and then as it may lead to waste of cost. This paper proposes a method to derive an accurate and optimized schedule for rejuvenation of a web server (Apache) by using Radial Basis Function (RBF) based Feed Forward Neural Network, a variant of Artificial Neural Networks (ANN). Aging indicators are obtained through experimental setup involving Apache web server and clients, which acts as input to the neural network model. This method is better than existing ones because usage of RBF leads to better accuracy and speed in convergence. Keywords software aging; software rejuvenation; rejuvenation schedule; ANN; RBF; 1. INTRODUCTION Software aging is a phenomenon that refers to progressive performance degradation or transient failures or even crashes in long running software systems such as web servers. It mainly occurs due to the deterioration of operating system resource, fragmentation and numerical error accumulation [1]. Unexpected downtime cost due to software aging is high mainly in ecommerce websites and safety/business-critical applications. Software aging injures the usability of the software system and brings inconvenience to the users. Software aging mostly occurs due to the accumulation of runtime errors. Runtime errors are the resultant of residual software effects such as memory leaking and unreleased file locks. These residual defects are difficult to be unveiled in the testing phase because there are few observable errors during the in-house testing phase. Even if they are unveiled, practical experience shows that most of corresponding errors are transient in nature [6], and difficult to be localized and removed. Therefore, these residual defects must be tolerated by users during operational phase. Thus, like in humans, aging in software also cannot be avoided. We can just prolong the aging process or can reduce the effect caused by aging. So, the only possible solution to fight against aging is to reset the software system and clean its runtime environment before severe aging occurs, thereby avoiding system crash. This method is called software rejuvenation [7]. Software rejuvenation is a proactive fault management technique aimed at cleaning up the system internal state to prevent the occurrence of more severe crash failures in the future. It can maintain the robustness of software systems and avoid unexpected system outages. 2. OBJECTIVE OF THE PAPER Every long running application must face the process of aging in its life time. This phenomenon of software aging, if left unseen will cause failure or even lead to the crash of entire system. Since aging cannot be avoided, the only remedy available to prevent the
[Type text]
Page 357
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) system from failure and to bring it back to the robust state is Software Rejuvenation. The important objective that must be taken into consideration is the appropriate time when the rejuvenation has to be performed, as periodic rejuvenation leads to wastage of cost because long running application such as a web server could not be put down now and then. This paper aims to obtain an accurate and optimized schedule for rejuvenation of a web server namely Apache. To do so, the status of aging indicators, the parameters that denotes aging process, has to be forecasted. The values are forecasted using Radial Basis Function based Feed Forward Neural Network. The proposed model leads to better accuracy and speed in convergence rate than the existing analytical and statistical models as these models assume the underlying probability distribution used for scheduling, leading to poor accuracy. 3. RELATED WORKS The existing works on software aging is comes under either model-based approach or measurement-based approach. In model-based approach, certain assumption such as the origin and result of software aging are made and based on these assumptions a mathematical model is built. This model is either analytical or stochastic. The existing researches on model-based studies involve the proposal of a three state stochastic model to determine the best time to restart a telecommunication system switch [7] and Garg et al. analysed software aging in a transaction system and proposed an analytical model to estimated the probability of losing an arriving transaction and the expected response time of a transaction [8]. Dohi et al. built semi-Markov reward process models of software aging to solve the same problem [9, 10]. All these above mentioned models suffer from a drawback that the mathematical assumptions cannot be easily validated in practice and the derived mathematical properties are not useful for software maintenance [3]. In measurement-based approach, the aging indicators are monitored and based on the collected data the robustness of the system is assessed by applying statistical regression techniques such as auto regression [1], threshold auto regression [4] and autoregressive moving average model [5], to the monitored data. Wavelet Network is also used to solve the same problem [15]. The rate of software aging is usually not constant and it depends upon the system workload which varies according to time. Thus, time series model fits well to predict the future resource usage. Shishiny et al. proposed an MLP based Neural Network model to analyse software aging and predict the resource usage [16]. This work intend to provide a better optimal schedule for rejuvenation with increased accuracy rate and speed in convergence than the existing work, by using RBF based Neural Network to analyse resource usage data collected on a typical long-running software system namely, a web server, to assess the suitability of RBF based Neural Network for the analysis of software aging. 4. EXPERIMENTAL SETUP To obtain an optimized schedule for rejuvenation, the aging indicators required for forecasting has to be obtained. Many aging indicators are available. For experimental purpose, three aging indicators namely, response time of the web server, used swap space and free physical memory are taken into consideration. In order to obtain the values of these aging indicators, an experimental setup shown in Figure.1 is made. The platform is deployed with an Apache server, two clients, and a 100M switch. The three computers are connected via the switch to each other. Linux Fedora 10 operating system is installed on the three computers, whose configuration is as follows: Server configuration: Processor Frequency: Dual Core 2GHz; Memory: 2GMB; Client configuration: Processor Frequency: single-core 1.4GHz; Memory: 512MB;
[Type text]
Page 358
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) information. For collecting resource usage data over a long time period, a shell program was used to run httperf periodically. As for the connection rate, a value of 400 requests per second was chosen, which puts the web server in an overload state, and should speed up software aging. 5. ARTIFICIAL NEURAL NETWORK Artificial Neural Network is one of the machine learning technique that is inspired by the organization and functioning of biological neurons. There are numerous artificial neural network variations that are related to the nature of the task assigned to the network. ANN has several advantages over statistical methods. Artificial neural networks can be universal function approximators for even non-linear functions. Artificial neural networks can also estimate piece-wise approximations of functions. ANN has the ability to discover patterns adaptively from the data. When an appropriate number of nonlinear processing units is given as input, neural networks can learn from experience and estimate any complex functional relationship with high accuracy [11]. Numerous successful ANN applications have been reported in the literature in a variety of fields including pattern recognition and forecasting [12]. 5.1 ANN FOR TIME SERIES FORECASTING The usage of ANN for time series analysis relies entirely on the data that were observed and is powerful enough to represent any form of time series. ANN can learn even in the case of noisy data and can represent nonlinear time series. For example, Given a series of values of the variable x at time step t and at past time steps x(t), x(t-1), x(t-2),, x(t-m), we look for an unknown function F such that; x(t+n)=F[x(t), x(t-1), x(t-2),, x(t-m)], which gives an n-step predictor of order m for the quantity x. Although many types of neural network models have been proposed, the most popular one for time series forecasting is the Multi-Layer Perceptron (MLP) feed
Figure.1 Experimental Setup Httperf [17], a Web server test tool, is deployed on the clients to generate artificial concurrent requests with exponential time intervals to access the statichtml files on the Apache server. Since Apache has been well tested in practice, it is difficult for us to observe its aging symptoms in a short period under a normal runtime environment and the default parameter settings. It is necessary to find some way to expedite the aging of Apache. In the experiments, we adjust two parameters that are related to the accumulation of the effects of software errors: MaxRequestPerChild and MaxSpareServers [2]. The first parameter limits the number of requests handled by each child process of Apache. The second parameter, MaxSpareServers, sets the maximum number of idle child processes. When the number of requests is low, some of existing child processes may be at idle state. If there are more than MaxSpareServers idle processes, Apache will kill excess ones. By setting it to zero, we can turn off this mechanism so that no child processes will be killed during runtime. The used swap space and free physical memory are collected from /proc file system of Linux. From the /proc file system and with the help of the Linux monitoring tool procmon, measurements were periodically collected. Httperf were used to generate requests with constant time intervals between two requests. Each request accesses one of five specified files of sizes 500 bytes, 5 kB, 50 kB, 500 kB, and 5MB on the server. Httperf is not only a workload generator, but it can also be employed for monitoring performance
[Type text]
Page 359
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) forward model [13]. A feed-forward network can map a finite time sequence into the value that the sequence will have at some point in the future [14]. Feed forward ANNs are intrinsically non-linear, nonparametric approximators, which makes them suitable for complex prediction tasks. The ANN sees the time series X1,,Xn in the form of many mappings of an input vector to an output value. The time-lagged values x(t), x(t-1), x(t-2),,x(t-m) are fed as inputs to the network, which once trained on many input-output pairs, gives as output the predicted value for yet unseen x values. 6. RBFNN FOR FORECASTING SOFTWARE AGING Figure 2 shows the radial basis function neural network. The bell shaped curves in the hidden nodes indicate that each hidden layer node represents a bell shaped radial basis function that is centered on a vector in the feature space. There are no weights on the lines from the input nodes to the hidden nodes. The input vector is fed to each m-th hidden node with the following radial basis function Ym = fm(X) = exp[-|X-Cm|2/(22)] (1)
Radial Basis Functions emerged as a variant of ANN in late 80s. RBFs are embedded in a two layer neural network, where each hidden unit implements a radial activated function. The output units implement a weighted sum of hidden unit outputs. Unlike MLP network model, RBFNN does not have weight adjustment in the link between the input layer and hidden layer. In RBFNN weights are adjusted only in the link between the hidden layer and output layer. RBF networks have excellent approximation capabilities [18]. Due to their nonlinear approximation properties, RBF networks are able to model complex mappings, which perceptron neural networks can only model by means of multiple intermediary layers.
where |X-Cm|2 is the square of the distance between the input feature vector X and the center vector C m for that radial basis function. The values {Y m} are the outputs from the radial basis functions. The values equidistant from the center in all directions have the same values, so this is why these are called radial basis functions. The outputs from the hidden layer nodes are weighted by the weights on the lines and the weighted sum is computed at each j-th output node as Zj = (1/M)(m-1, M) mjYm (2)
The collected data set is divided into three segments, one to train the RBFNN, one for validation, and the third for testing. The testing segment is used to evaluate the forecasting performance of the RBFNN in predicting the performance parameters values, since the work proposed follows supervised learning technique. 6.1 TRAINING THE RBFNN In order to predict the status of aging indicators to obtain an optimized schedule for rejuvenation, the RBFNN has to be trained such that the mean square error is minimized to the maximum extent. The mean square error function that is to be minimized by adjusting the parameters {mj} is similar to the one for BPNN except that this is much simpler to minimize. There is only one set of parameters instead of two as was the case for BPNNs. Upon suppressing the index q we have
Figure 2: RBF Network Model
[Type text]
Page 360
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) E= (1/J)(j-1,J) (tj - Zj)2 E= (1/J)(j-1,J) (tj (1/M)(m-1,M) mj Ym) Thus,
XVI. XVII.
2
(3)
The root mean squared error, Ei of an individual program i is evaluated by the equation:
E/MJ = (E/ZJ)( ZJ/MJ) E/MJ = [(-2/J) (J-1,J) (TJ - ZJ)](YM/M) (4)
where P(ij) is the value predicted by the individual program i for sample case j (out of n sample cases); and Tj is the target value for sample case j. For a perfect fit, P(ij) = Tj and Ei = 0. So, the Ei index ranges from 0 to infinity, with 0 corresponding to the ideal. 7.2 MAPE MAPE is calculated by averaging the percentage difference between the fitted (forecast) line and the original data: MAPE= 1/n [ t | et/yt | * 100 ]
Upon putting this into the steepest descent method mj(k+1) = mj(k) + [2/(JM)] (j-1,J) (tj - Zj)] Ym (5) where is the learning rate, or step size, as before. Upon training over all Q feature vector inputs and their corresponding target output vectors, equation (5) becomes mj(k+1)=mj(k) + [2/(JM)] (q-1, Q) (j-1,J)(tj(q) Zj(q))]Ymq (6) The center vectors {C(m): m=1,..,M} on which to center the radial basis function , the exempler vectors {X(q):q=1,..Q} are considered as centers by putting C(m)=X(q) for m=1,..,Q. Once the network is trained, it is tested with the help of last segment of the observed data. 7. PARAMETERS EVALUATION FOR PERFORMANCE
where y represents the original series and e the original series minus the forecast, and n the number of observations. 7.3 SMAPE SMAPE calculates the symmetric absolute error in percent between the actual X and the forecast F across all observations t of the test set of size n. The formula is:
The forecasting accuracy is measured by Root Mean Square Error (RMSE) and two other common error measures, Mean Absolute Percent Error (MAPE) and Symmetric Mean Absolute Percent Error (SMAPE). 7.1 RMSE
8. CONCLUSION AND FUTURE WORK This paper is proposed as an initiative of implementing the proposed RBFNN model to obtain an optimized schedule for rejuvenation of web server.
[Type text]
Page 361
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) To obtain the schedule for rejuvenation the aging indicators are selected and real time values are obtained through experimental set up. The obtained values are used for training the RBFNN model used to predict the expected results of aging indicators. The predicted results are to be evaluated using the accuracy parameters such as RMSE, MAPE and SMAPE. As a next step of this proposal, the proposed RBFNN model is to be implemented and the accuracy to be obtained is to be verified graphically. 9. REFERENCES [1] Michael Grottke, Lei Li, Kalyanaraman Vaidyanathan, and Kishor S. Trivedi, Analysis of Software Aging in a Web Server, IEEE Transactions on Reliability, vol. 55, no. 3, September 2006 [2] Yun-Fei Jia, Lei Zhao and Kai-Yuan Cai, A Nonlinear Approach to Modeling of Software Aging in a Web Server, 15th Asia-Pacific Software Engineering Conference, 2008 [3] Yun-Fei Jia, Jing-Ya Su, and Kai-Yuan Cai, A Feedback Control Approach for Software Rejuvenation in a Web Server, 978-1-4244-3417-6/08, 2008 IEEE [4] Xiu-E Chen, Quan Quan, Yun-Fei Jia and Kai-Yuan Cai, A Threshold Autoregressive Model for Software Aging, Proceedings of the Second IEEE International Symposium on Service-Oriented System Engineering 07695-2726-4/06, 2006 [5] Lei Li, Kalyanaraman Vaidyanathan and Trivedi, An Approach for Estimation of Aging in a Web Server, Proceedings of International Symposium on Empirical Engineering 0-7695-1796-X/02, 2002 Kishor S. Software the 2002 Software study for long-life applications, Evaluation, 35, 215232, 1998 Performance
[7] Y. Huang, C. Kintala, N. Kolettis, and N. Fulton, Software Rejuvenation: Analysis, Module and Applications, in Proceedings of the 25th IEEE International Symposium on Fault-Tolerant Computing, pp. 381-390, Pasadena, USA, June 1995. [8] S. Garg, A. Puliafito, M. Telek, and K. S. Trivedi, Analysis of Software Rejuvenation Using Markov Regenerative Stochastic Petri Net, in Proceedings of the Sixth International Symposium on Software Reliability Engineering, pp. 24-27, 1995. [9] T. Dohi, K. Goseva-Popstojanova, and K. S. Trivedi, Analysis of software cost models with rejuvenation, in Proceedings of the International Symposium on High Assurance Systems Engineering, pp. 2534, 2000 [10] T. Dohi, K. Goseva-Popstojanova, and K. S. Trivedi, Estimating software rejuvenation schedules in high assurance systems, Computer Journal, 44(6):473485, 2001 [11] Xin Yao, Senior Member, IEEE, Evolving Articial Neural Networks, Proceedings of the IEEE, vol. 87, no. 9, September 1999 [12] Hornik, K., Stinchcombe, M., White, H., Multilayer feedforward networks are universal approximators, Neural Networks 3, 551-560, 1989 [13] Zhang, G. Peter and Qi, Min, Neural network forecasting for seasonal and trend time series , European Journal of Operational Research 160, 501514, 2005 [14] Hassoun,M. H., Fundamentals of Artificial Neural Networks, MIT Press, 1995
[6] A. T. Tai, L. Alkalaj, and S. N. Chau, On-board preventive maintenance: a design-oriented analytic
[Type text]
Page 362
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) [18] Park. J, Sandberg. J.W, Universal Approximation using Radial Basis Functions Network, Neural Computation, vol.3, pp. 246-257 [19] David Lorge Parnas, Software Aging, 02705257/9 4000 1994 IEEE [20] Michael Grottke, Rivalino Matias Jr., Kishor S. Trivedi, The Fundamentals of Software Aging, 1st International Workshop on Software Aging and Rejuvenation, IEEE, 2008 [21] QingE WU, ZhenYu Han, TianSong Guo, Application of an Uncertain Reasoning Approach to Software Aging Detection, Fifth International Joint Conference onINC, IMS and IDC, 2009
[15] Xu, J., You, J. and Zhang,K., A Neural-Wavelet based Methodology for software Aging Forecasting, IEEE International Conference on Systems, Man and Cybernetics, Volume 1, Issue , 10-12 Oct. 2005 Page(s): 59 63 Vol. 1, 2005 [16] Hisham El-Shishiny, Sally Sobhy Deraz, Omar B. Badreddin, Mining Software Aging: A Neural Network Approach, 978-1-4244-2703-1/08, 2008 IEEE [17] D. Mosberger and T. Jin, Httperf - A Tool for Measuring Web Server Performance, in the First Workshop on Internet Server Performance, Madison, USA, June 1998.
[Type text]
Page 363
Optimal Packet Processing Systems in Network Processor IXP2400 with dynamic multithread system
S.Sahunthala , Lecturer, Department of IT, Anand Institute of Higher Technology
Abstract
Network processors provide a flexible, programmable packet processing infrastructure for network systems. To make full use of the capabilities of network processors, it is imperative to provide the ability to dynamically adapt to changing traffic patterns in the form of a network processor runtime system. The differences from existing operating systems and the main challenges lie in the multiprocessor nature of NPs, their on-chip resource constraints, and real-time processing requirements. In this paper we explore the key design trade-offs that need to be considered when designing a network processor operating system and how the packet processing is efficient manner with the mechanism of packetised dynamic batch co-scheduling . The performance impact of application analysis on partitioning, traffic characterization, workload mapping, and runtime adaptation. The observations and conclusions are generally applicable to any runtime environment for network processors. This trend expands the functionality of networks to include increasingly diverse and heterogeneous end systems, protocols, and services. Even in todays Internet, routers perform a large amount of processing in the data path. Examples are firewalling, network address translation (NAT), Web switching,IP traceback, TCP/IP offloading for high-performance storage servers, and encryption for virtual private networks (VPNs). Many of these functions are performed in access and edge networks, which exhibit the most diversity of systems and required network functions. With the broadening scope of networking it can be expected that this trend will continue, and more complex processing of packets inside the network will become necessary. Well defined high-speed tasks are often implemented on application- specific integrated circuits (ASICs). Tasks that are not well defined or possibly change over time need to be implemented on a more flexible platform that supports the ability to be reprogrammed. Network processors (NPs) have been developed for this purpose. The performance demands of increasing link speeds and the need for flexibility require that these NPs are implemented as multiprocessor systems. This makes the programming of such devices difficult, as the overall performance depends on the fine-tuned interaction of different system components (processors, memory interfaces, shared data structures, etc.). The main problem is handling the complexity of various interacting NP system components. To achieve the necessary processing performance to support multigigabit links, NPs are implemented as system-on-a-chip multiprocessors. This involves multiple multithreaded processing engines, different types of on- and off-chip memory, and a number of special purpose co-processors. Introduction The processing infrastructure for these various packet pro-cessing tasks can be implemented in a number of ways. Well defined high-speed tasks are often implemented on applica-tion-specific integrated circuits (ASICs). Tasks that are not well defined or possibly change over time need to be imple-mented on a more flexible platform that supports the ability to be reprogrammed. Network processors (NPs) have been developed for this purpose. The performance demands of increasing link speeds and the need for flexibility require that these NPs are implemented as multiprocessor systems. This makes the programming of such devices difficult, as the overall performance depends on that architecture. B. Can implement multiple packet processing applications at the same time C. Can quickly add and remove processing functions from its workload D. Can ensure efficient operation under all circumstances In particular, the management of various system resources is important to avoid performance degradation from resource bottlenecks.
In this article we explore a variety of design issues for a runtime environment that supports several concurrent net-work processing applications and how the packet is processed using packetized dynamic batch co-scheduling of the workload on a multiprocessor system. The key design considerations that are addressed fall into four broad categories: C. D. E. F. Application partitioning Traffic characterization Runtime mapping and adaptation System constraints
The remainder of the article presents some background on related work and differences between NP runtime systems and conventional operating systems. Then qualitative design trade-offs are considered, followed by a discussion on quantitative results from our experimental system. We then summarize our observations and findings. Background Related Work Commercial examples of NPs are numerous (Intel IXP series, EZchip NP-2, Hifn 5NP4G, etc.). An NP is typically imple-mented as a single-chip multiprocessor with high-performance I/O components, which is optimized for packet processing. In particular, NPs provide a more suitable architecture to handle these workloads than conventional workstation or server processors. The need for a specialized architecture is due to the uniqueness of the workload of NPs, which is dominated by many small tasks and high-bandwidth I/O operations. In order to achieve the necessary performance of ever-increasing line speeds and increasingly complex packet pro-cessing functions, NPs exploit the parallelism that is inherent in the workload of these systems. In general, packets can be processed in parallel as long as they do not belong to the same flow. Processing functions within a packet can also be parallelized to decrease packet delay. This leads to NP sys-tems with numerous parallel processing and coprocessing engines. To program such a system, several domain-specific programming languages have been developed. In the broader context of embedded systems, runtime
scheduling has been explored for real-time scheduling. Chakraborty et al. have developed a practical approach to determining if a task graph can be meet real-time constraints [5]. While they consider dynamic interactions through events, they do not consider the fully dynamic case of changing work-loads under different network traffic as we do in this article. Most existing real-time operating systems (RTOSs) are designed for single-core platforms and thus not applicable to network processors. Scheduling as a hardware/software co-design problem has been shown to outperform RTOS schedul-ing [6], but introducing hardware components into NPs for runtime support is not likely to happen in the near future. Network Processor Operating System The term operating system (which we use synonymously with runtime system) is most commonly used in the context of workstation computers. The responsibilities of such an operat-ing system are to manage hardware resources and isolate users from each other. The optimization target is commonly to minimize the execution time of a single task (the one the user is currently working on). It is important to note that the goals of an operating system for NPs are very different. On an NP, all applications are controlled by the same administrative entity, and optimization aims at maximizing overall system throughput. The following list details the differences between NPOSs and conventional operating systems: Separation between Control and Data Path. This separa-tion refers to the processor context, not the networking con-text. To achieve high throughput on NPs, several studies have shown that it is more economical to implement a larger num-ber of simpler processing engines than fewer more powerful ones. Such simple processors do not have the capability to run complex control tasks on top of packet processing tasks. In todays NP designs, control is implemented on a separate con-trol processor. Due to this separation between classes of pro-cessors, it is necessary to have a more explicit control structure than one would have in a conventional operating system. Limited Interactivity. Users do not directly interact with an NP or its operating system. At most, applications are installed and removed occasionally. This does not mean that a user could not change configurations on an NP (e.g., update rules for a
firewall application), but the key variable in this system are the traffic patterns that determine what processing needs to happen. Regularity and Simplicity of Applications. One dominating aspect of network processing is that data path processing is performed on individual packets. This means that packet pro-cessing tasks are typically limited in complexity due to the real-time constraints imposed by continuously arriving pack-ets. As a result, the processing demands are low (a few hun-dred to several thousand instructions [7]). Additionally, the execution path within an application is the same in a large number of cases and only slightly different for the other cases. Therefore, it is feasible to analyze packet processing applica-tions in detail to find good processor mappings. Processing Dominates Resource Management. Convention-al operating systems need to implement a number of different functions: processor scheduling, memory management, appli-cation isolation, abstraction of hardware resources, and so on. In network processor systems, these challenges are dominated by managing processing resources. The diversity of hardware resources is limited, and many are controlled directly by the application. Also, memory is usually allocated statically to ensure deterministic runtime behavior. This might change in the future as network applications become more complex and NPOSs become more similar to conventional operating sys-tems. In this work we focus on processing aspects of operating system functionality. Nonexistence of User Space/Kernel Space Separation. All functions on an NP are controlled by the same administrative entity. There is no clear separation between user space and kernel space in the traditional sense. Instead, functionality is divided between control and data path. As a result, traditional protection mechanisms are typically not implemented in NPOSs. Due to these numerous and significant differences between what is conventionally thought of as an operating sys-tem and what is necessary for an NP, we believe it is impor-tant to explore some of the fundamental design issues encountered in the context of NPOSs. System Operation In order to explore runtime system design aspects
concretely, we assume a general operational approach as shown in Fig. 1. There are four basic steps that are necessary for runtime sup-port of NP systems: application analysis, traffic characteriza-tion, workload mapping, and adaptation. There is a fundamental question of what should be done offline (e.g., during application development) and what should (and can realistically) be done during runtime. We discuss the different design choices for these components. Since the quantitative results are highly dependent on a particular system, we have separated the dis-cussion of trade-offs to preserve its general applicability.
Figuer 1. Processing and Traffic analysis in NP system Application Partitioning Application analysis is necessary to analyze the processing requirements of the application and be able to partition the application. Partitioning allows the distribution of different subtasks onto different processing elements to fully utilize the resources on the NP. The simplicity and repetitiveness of net-work processing applications allows detailed analysis of the application. The profiling process is shown as an offline comNetwork processor applications rarely consist of a single monolithic piece of code. More commonly, NP applications are split into several subtasks. For example, on the Intel IXP2400, input processing is separated from forwarding and output processing. This partitioning makes application devel-opment somewhat easier and changes to the application easier to implement. Also, it allows exploiting of parallelism and pipelining to fully utilize the multiprocessor infrastructure. How can a runtime system support this application partition-ing? With the limited processing resources on current NP architectures, this analysis cannot be done online. Typically, such an analysis can be performed in the context of the appli-cation
development environment for the NP applications. The partitioning can be performed in different ways, and the level of
-
Manual Partitioning:
Automated Partitioning: One key question is whicht granularity of application partitioning is most suitable. The spectrum of choices ranges from monolithic applications to extremely finegrained instruction (or basic block) allocations to processing resources.
applications on the net-work processor. Heavily used applications typically need to be replicated multiple times to provide sufficient performance (e.g., multiple parallel IP forwarding applications). When con-sidering runtime support for workloads, it is important to be able to analyze network traffic to estimate and possibly predict a suitable application allocation.
-
Static Traffic Model: Dynamic Traffic Model:
If the application is not partitioned, it can only be allocated to a single processing engine. This significantly limits how the application workload can be adapted to network traffic requirements. Also, it may cause performance bottlenecks in pipelined systems (e.g., multiple sequential applications per packet), where the pipeline speed is determined by the maximum stage time. Finally, as application size continues to grow, monolithic applications do not allow for scalable distribution of processing and may conflict with instruction store limita-tions. The extreme opposite case is a partitioning where each instruction is mapped individually to a processing resource. This approach provides more flexibility, but also generates more overhead for managing the installation and control of the application. It also increases the complexity of the map-ping problem, because a large number of nodes have to be mapped and the space of possible solutions grows significantly with the number of mapping choices. Ideally, we would like to find a balanced partitioning that allows efficient distribution of processing tasks, Network traffic characterization is another important aspect of runtime support for NP systems. Depending on the require-ments of current network traffic, different applications domi-nate the processing. The dynamically changing workload is the main reason runtime support is necessary. In order to achieve a good allocation of processing resources, it is necessary to know what processing is necessary for packets currently in the system (or to be processed in the near future). The result of traffic analysis is an application allocation, which describes the ratio of processing required by each application available on the system. Traffic characterization is an important input to determin-ing a suitable allocation of different
Design Choices: In order to determine the allocation of applications to the NP system, traffic characterization can be performed according to the approaches described above. While static traffic models are simplest, they do not serve situ-ations where any changes in workload occur. When consider-ing dynamic traffic models, it is important to consider the trade-off between accuracy and delay. The more packets can be buffered and analyzed for determining workload require-ments, the more accurately such processing needs can be determined. This, however, comes at the cost of increasing delay as system adaptation is delayed until an accurate pro-cessing estimate is available. Runtime Mapping and Adaptation Workload mapping is the process that assigns processing tasks to actual processing engines. This assignment is based on the application allocation and application partitioning derived in the previous two steps. Mapping can be performed in a num-ber of different ways and depends on the particular system architecture, application development environment, and oper-ational principles of a system. The goals of mapping are to achieve high system throughput and efficient resource utilization. The adaptation step illustrates the need to reconfigure the NP system to match the processing requirements dictated by the traffic workload. During adaptation, the application allo-cation is changed according to the new traffic requirements. Then the mapping step modifies the allocation of tasks to pro-cessors to match the new allocation.The mapping of application tasks to processing elements is performed through a mapping algorithm. We explore the design trade-offs without requiring that a particular algorithm be used. However, we assume two properties of the mapping algorithm: 4.The mapping algorithm can yield incrementally better
results as the runtime increases. This implies that the run-time system designer could choose the runtime of the algo-rithm and the quality of the resulting mapping. 5. The mapping algorithm can be employed on a partially configured system. This means that some applications can be removed and others added without changing the map-ping of applications that are not affected. The resulting design choices address the frequency and level of (partial) mapping. Static Mapping: Static mapping goes hand in hand with static partitioning and a static traffic model. In this case pro-cessing tasks are allocated to processors offline, and no changes are made during runtime. Complete Dynamic Mapping: Complete mapping refers to a mapping solution where the entire workload is mapped from scratch. The mapping algorithm can place processing tasks on the entire architecture without any initial constraint. This typi-cally leads to a good solution that approximates the theoreti-cal optimum for increasing processing times. Partial Dynamic Mapping: Partial mapping assumes that some part of the workload is already mapped to the NP sys-tem. The mapping algorithm only needs to map a few applica-tions to the remaining processing resources. This approach is more restrictive than complete dynamic mapping because many of the system resources are already assigned to applica-tions. The incremental nature of this approach poses the risk that the mapping algorithm gets stuck in a local minimum. Nevertheless, the processing cost for the partial mapping is less than mapping the entire workload. Design Choices: Design choices for mapping determine how often, how much, and with how much effort to perform complete or partial mapping. In order to adapt to changing traffic conditions, the NP run-time system needs to change the application allocation and thus the mapping. Ideally, we want to reconfigure application allocation with every packet to guarantee the best system uti-lization and high performance while traffic is varying. Howev-er, there is a cost associated with mapping and remapping. Apart from the cost of uploading new instructions to each processor, determining the new mapping requires
processing power and computation time. It is important to keep the reprogramming frequency at a low enough rate that sufficient processing time is available to find good mapping results. The lower the adaptation rate, the more time can be spent on find-ing a better mapping solution. As the adaptation rate increas-es, the quality of the derived mapping solution decreases. Changes in traffic conditions may only affect a few applica-tions. In order to be able to adapt quickly with low mapping cost, a runtime system designer may choose to only map a small part of the overall allocation. The benefit of this is the ability to adapt quickly, but the amount of traffic variation that can be supported is limited to the fraction of the NP that is remapped. Repeated partial mapping causes mapping solutions to deterio-rate. In order to avoid this, complete mapping steps should be performed periodically. The more frequently this happens, the less likely the system will go into an inefficient state. However, this also increases the overall mapping effort. Constraints An NP has a number of system constraints that are not con-sidered in the above discussion. These constraints can play a major role when making design decisions. Most NP systems are severely limited in the amount of instruction storage available for each processing engine. This is due to the relatively high chip area cost of memory compared to processing logic. This limitation is the reason not all applications can be installed on all processing engines at all times. Therefore, mapping changes are quite costly as they require uploading of new instructions to every processor. At the same time, an NP system needs to be capable of processing any packet transmitted on the network Packetized Dynamic Batch CoScheduling Packet Scheduling: which 1 Balancewhich Bwhich while ( true ) PSize size of the head-of-line packet if (Balancewhich PacketSize/2) Dispatch the head-of-line packet to Pwhich Balancewhich Balancewhich PacketSize else Bwhich Bwhich + Balancewhich if ( which == N ) which 1
Balancewhich Bwhich wait for Gapd seconds else which which +1 In the con-text of an NP runtime system, this means that processing resources should be available for all applications. If this is not the case, packets need to be delayed until the next adaptation cycle or processed in the slow path of the router. One way to avoid this delay is to overprovision the system and increase the application allocation to more than 100 percent. It can be ensured that even applications not expected to see any traffic in the upcoming batch can be installed just in case. Quantitative Results In this section we support the qualitative observations of the previous sections with quantitative results. This helps illustrate which trends have a large impact and which have a small impact on system performance. In order to derive quantitative results, we use a particular system baseline. Of course, there are big differences between different NP systems, and results on another system would look somewhat different. It is therefore more important to consider the trends that can be observed in our results (e.g., does an optimum exist?) than individual data points (e.g., where exactly is the optimum?). Baseline System The metric in which we are interested is the throughput of an NP system given a certain workload. In order to derive this information, we need to implement some of the func-tions necessary for network runtime systems. In particular, we need to consider realistic network processing applica-tions, their partitioning, and the mapping of processing tasks to processing engines. In order to explore the design space we have described, it is not sufficient to consider only a handful of partitioning and mapping solutions. Therefore, we choose to use simulation. With the automated partitioning, mapping, and per-formance modeling environment provided in [9], we can evaluate a large number of possible application partitioning, mapping results, and so on. This provides a first-order understanding of the quantitative trade-offs. In the process, several ancillary metrics (e.g., cost for deriving a certain quality mapping) can be obtained.
Application Representation: A network processing applica-tion needs to be represented in a way that it can easily be mapped to multiple parallel or pipelined processing ele-ments. This requires a representation that exhibits application parallelism while also ensuring that data and control depen-dencies are considered. We use an annotated directed acyclic graph (ADAG) to represent the dynamic execution profile of applications. The ADAG is derived from dynamic profiling of the appli-cation by determining data and control dependencies between individual instructions. The regularity and simplicity of net-work processing applications allows for loop unrolling and efficient ADAG representation. Using a clustering heuristic that minimizes the communication overhead, instructions are aggregated to larger nodes in the graph. Each node is anno-tated with information on the total number of instructions and memory accesses that need to be executed when processing the node. Mapping Algorithm: Once we have the application repre-sented as an ADAG, the next step is to map the ADAG onto a NP topology. The goal of the mapping is to assign process-ing tasks (i.e., ADAG nodes) to processing elements and gen-erate a schedule that achieves the maximum system throughput. This assignment is not easy because the mapping process needs to consider the dependencies within an ADAG and ensure that a correct processing of packets is possible. Furthermore, producing an opti-mal schedule for a system that includes both execution and communication cost is NP-complete, even if there are only two processing elements [10]. Therefore, we need to develop a heuristic to find an approximate solution. Our heuristic solution to the mapping problem is based on randomized mapping. The key idea is to randomly choose a valid mapping and evaluate its performance. By repeating this process a large number of times and picking the best solution that has been found over all iterations, it is possible to achieve a good approximation to the global optimum. With the randomized approach any possible solution is considered and chosen with a small but non-zero probability. This technique has been proposed and successfully used in different application domains [11]. The mapping is per-formed for multiple, possibly different, ADAGs. The mixture of ADAGs represents the allocation of applications to the NP architecture.
Analytic Performance Model: In order to evaluate the throughput performance of a given solution, we use an ana-lytic performance model that considers processing, inter-processor communication, memory contention, and pipeline synchronization effects. After mapping the application ADAGs to the network processor topology, we know exactly the workload for each processing element. This information includes the total number of instructions executed, the number of memory accesses, and the amount of communication between stages. The model needs to take this into account as well as contention for shared resources (memory channels and communication interconnects). We are particularly interested in the maximum latency of each pipelined processing stage since that determines the overall system speed. The number of ADAGs that are mapped to an architecture determines how many packets are processed during any given stage time. After specifying the several system parameters, the throughput of the system for a given mapping can be expressed. The details of this analytic model and its validation against cycle accurate simulation can be found . System Configuration and Workload: The system architec-ture considered in the above model can be configured to represent any regular NP architecture with a specified number of parallel processing engines (PEs) and pipeline stages. We have chosen one single baseline system with a fixed configura-tion to explore the runtime system issues . A separate question is how these change for different architecture configurations. This design space exploration is currently not addressed in our work. The applications that are considered for this system are radix-tree-based IPlookup and hash-based flow classification. Processing Task Mapping Metrics: Mapping takes a certain amount of processing time, which can be seen as the cost of mapping. The performance achieved by the mapping is expressed as the throughput of the system. Due to the NP-completeness of the mapping problem, finding the overall optimal solution is infeasible. What is really desirable in a system is to obtain a good enough solution by running the approximation algorithm for a large amount of time. Results: Increasing quality of the map-ping result as more processing effort is dedicated to the map-ping
process. The time takes to calculate a mapping (expressed as the number of randomized mapping iterations). We quantify this cost on the Intel IXP2400 and also consider the overhead for stopping, reprogramming, and restarting processing engines. The best throughput that was found within a given number of mapping attempts. This is expressed in relation to the maximum system performance, which is derived by using a very large number of mapping attempts. With increasing mapping cost, more ran-domized mappings can be attempted, and mappings with high-er throughput can be found. Partitioning The goal of application partitioning is to study the impact of partition granularity on the performance. Metrics: We consider the number of tasks (or nodes in the ADAG), n, into which the application is partitioned. The maximum n is different for each application, but for this evaluation we only consider values of n that are much smaller than this maximum. If the partitioning is balanced, the size of each subtask is approximately 1/n of the applica-tion size. Results: First, we explore the trade-off between system throughput and partitioning granularity. Finer granularity promises better performance if the permissible mapping effort is unbounded. When considering the realities of a network processor runtime system, the mapping effort is bounded by the batch size and adaptation frequency. Traffic Characterization The need for runtime adaptation is determined by the charac-teristics of network traffic. Metrics: We assume that traffic is processed in batches (with batch size b), and the application allocation is based on a sample (size l). We can then describe the traffic variation v based on two metrics, ei,j(a) and pi,j(a). Metric e reflects the estimated number of packets requiring application a, and met-ric p is the actual number of packets requiring this application in packet interval [i j): 1 vi (l , b)
i l( a),0).
max( pi
b
, i b
( a) l ei,
(1)
For example, if the traffic exactly matches the estimated allocation, all packets match up and the traffic variation is v = 0. If half the packets of a batch are different from what was expected (e.g., all packets require a single application instead Conclusion We have presented an extensive qualitative discussion of design issues related to runtime system design for network processors. To illustrate the design considerations, we have provided quantitative results that highlight performance trade-offs between various design parameters. Finally, we have explored three different runtime system designs in the context of the Intel IXP2400, and discussed their benefits and drawbacks. We believe that this study provides an important basis for design and implementation of future runtime systems for network processors. Understanding the presented trade-offs will guide runtime system designers in considering the rele-vant interactions between applications, network traffic, and the underlying hardware. This will bring us closer to realizing network processors as easy-to-use components of network systems. References [1] S. D. Goglin et al., Advanced Software Framework, Tools, and Languages for the IXP Family, Intel Tech. J., vol. 7, no. 4, Nov. 2003, pp. 6476. [2] N. Shah, W. Plishker, and K. Keutzer, NPClick: A Programming Model for the Intel IXP1200, Proc. 2nd Network Processor Wksp. in conjunction with 9th IEEE Intl. Symp. High Perf. Comp. Architecture , Anaheim, CA, Feb. 2003, pp. 10011. [3] E. Kohler et al., The Click Modular Router, ACM Trans. Comp. Sys., vol. 18, no. 3, Aug. 2000, pp. 26397. [4] R. Kokku et al., A Case for Run-Time Adaptation in Packet Processing Systems, Proc. 2nd Wksp. Hot Topics in Networks, Cambridge, MA, Nov. 2003. [5] S. Chakraborty et al., Schedulability of EventDriven Code Blocks in Real-Time Embedded Systems, DAC 02: Proc. 39th Conf. Design Automation, June 2002, pp. 61621. [6] V. J. Mooney, III and G. De Micheli, Hardware/Software Co-Design of Run-Time Schedulers for Real-Time Systems, Design Automation for Embedded Sys., vol. 6, no. 1, Sept. 2000, pp. 89144.
[7]R. Ramaswamy, N. Weng, and T. Wolf, Application Analysis and Resource Map-ping for Heterogeneous Network Processor Architectures, Proc. 3rd Wksp. Net-work Processors and Apps. in conjunction with 10th IEEE Intl. Symp. High Perf. Comp. Architecture, Madrid, Spain, Feb. 2004, pp. 10319. [8] W. Plishker et al., Automated Task Allocation for Network Processors, Proc. Network System Design Conf., Oct. 2004, pp. 23545. [9]N. Weng and T. Wolf, Analytic Modeling of Network Processors for Parallel Workload Mapping, to appear, ACM Trans. Embedded Comp. Sys. [10]B. A. Malloy, E. L. Lloyd, and M. L. Souffa, Scheduling DAGs for Asyn-chronous Multiprocessor Execution, IEEE Trans. Parallel and Distrib. Sys., vol. 5, no. 5, May 1994, pp. 498508. [11]R. Motwani and P. Raghavan, Randomized Algorithms, Cambridge Univ. Press, 1995. [12]T. Wolf, N. Weng, and C.-H. Tai, Design Considerations for Network Pro-cessor Operating Systems, Proc. ACM/IEEE Symp. Architectures for Net-working and Commun. Sys., Princeton, NJ, Oct. 2005, pp. 7180. [13]A. Gavrilovska, S. Kumar, and K. Schwan, The Execution of Event-Action Rules on Programmable Network Processors, Proc. 1st Wksp. Op. Sys. and Architectural Support for the On-Demand IT Infrastructure in conjunction with ASPLOS-XI, Boston, MA, Oct. 2004. [14]G. Welling, M. Ott, and S. Mathur, A clusterbased active router architecture, IEEE Micro, vol. 21, no. 1, January/February 2001. [15]Intel, Intel ixp2800 network processor, http://www.intel.com/design/network/products/npfa mily/ixp2800.htm. [16] IBM, The network processor: Enabling technology for highperformance networking, 1999. [17]Motorola, Motorola c-port corporation : C-5 digital communications processor, 1999, http://www.cportcom.com/solutions/docs/c5brief.pdf [18]J. Guo, F. Chen, L. Bhuyan, and R. Kumar, A cluster-based active router architecture supporting video/audio stream transcoding services,Proceedings of the 17th International Parallel and Distributed Process-ing Symposium (IPDPS03), Nice, France, April 203.
DATA HIDING, PARTIAL REQUEST AND DATA GROUPING FOR ACCESS CONTROL MATRIX IN CLOUD COMPUTING
J.Ilanchezhian1 K.A run 2 A.Ranjeeth 3 Varadharassu.v4 Sri Manakula Vinayagar Engineering College, Puducherry
Abstract
Cloud computing is the hottest topic happening currently for information technology. As any other data, the data stored in cloud is also prone to threats. Solving these threats becomes the security issue in the cloud. Clouds are normally categorized as public cloud, private cloud, community cloud and hybrid cloud. Mostly public clouds are allowed to access globally and security is applied in private clouds. In this paper we are going to see the access matrix model and how the access matrix model is implemented in retrieval of data in cloud computing for security, demerits in the current security model and our proposal to minimize the time taken during authentication by eliminating the unwanted requests.
Key words:
Subject, Object.
Access matrix model, Private cloud,
I Introduction
In cloud computing, server hoster like Amazon, Rackspace will provide the data to end users. Various securities are provided to the data stored in cloud by the cloud hosters. Here we are going to discuss the access matrix model which is used in distributed computing for security. This access matrix model can be used in cloud environment to provide security. In recent 10 years, Internet has been developing very quickly. The cost of storage, the power consumed by computer and hardware is increasing. The storage space in data center cant meet our needs and the system and service of original internet cant solve above questions, so we need new solutions. In recent years everybody would be aware of the advances made in the internet at the front end. Many new browsers evolved and new versions of the browsers keep on introduced, likewise in the backend many changes have been made in the database storing techniques. At the same time, large enterprises have to study data source fully to support its business[1]. The collection and analysis must be built on a new platform.
Why we need cloud computing? It is to utilize the vacant resources of computer, increase the economic efficiency through improving utilization rate, decrease the equipments energy consumption.
II Cloud computing
The cloud computing is often referred as pay for what you use i.e, when an application is developed by an IT enterprise or a user, to provide service, that application will be usually stored in a server. But purchasing a new server will result huge invest, instead there is an easy way were the application can be stored in cloud by paying for what we use and the services for the corresponding application will be automatically provided by the cloud server hoster. For instance facebook uses server hosters like Rackspace, 1&1 Interenet and OVH[2] as data centres to stored the data. Cloud computing is a new computing model, the large computing was run in the various computing resource on network. Its goal is to make the computing resources as the water and electricity to supply for user, so that to make it easier for users to use the cloud services[3] . Here what is meant by saying
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) computing resources as water or electricity is, when a new house is going to be built, instead of spending in digging a new bore, purchasing a new motor, tank, electricals, cost for electricity and maintenance, water connection would requested to government, so that we could only pay for the water, and it will be the duty of the government to look after the maintenance. Here in this case owner will have zero ownership. In cloud computing, all of resource on internet is formed a cloud resource pool, then these resource is dynamically allocated to different applications and services. Virtualization technology allows multiple operation systems and applications can be run on a shared computer. And when a server is heavily loaded, it can migrate an instance of operating systems and its applications from the heavyload server to a light-load one in the cloud resource pool. There are four different types of clouds such as public cloud, private cloud, community cloud and hybrid cloud. The type of cloud a company needs will be decided from its requirements. When a copmany wants to share the data it stored in the database, with the public, then it uses the public cloud. Google and facebook uses the public cloud. On the contrast if a company feels that the data it stores in cloud is highly confidential, then the company can choose private cloud. In few cases the cloud resources will be shared by more than one company, but for a common goal, then community cloud is used. Hybrid cloud is a combination of at least one public cloud and at least one private cloud. When a company feels some of the data about their own company as confidential, and wants to store the other data in public cloud for business, hybrid cloud will be a best option for them. There are three different modes of operation in cloud computing. They are saas-software as a service, paas-platform as a service and iaas-infrastructure as a service. In saas as name defines the software is provided as a service from the cloud instead of storing and installing it in the every local storage of a computer. The main advantage of using saas is that the cost is shared because the same software will be used by several other companies. When a company is not using the cloud, it will have to purchase the full version of every software needed, but while using saas the coast is shared. If there is any upgrade is made in the version and newly released, it will be job of the cloud service provider to upgrade to the new version. The next mode of operation is paas, were the platform is provided as a service. Paas is similar to virtualization, and remote desktop is an example of paas, were it offers a platform to utilize the resources. Iaas is previously known as hardware as a service, were the storage medium and hardware is provided as a service. Several companies have their databases in cloud, instead of having a local server. The main advantage of using cloud computing in a company is that, the company can concentrate on its core business instead of worrying about the data security and availability. Because the cloud can be connected by just connecting to the internet and the cloud providers will be responsible for the security. TCS ion is a cloud service provider, and it has two cloud servers, one in Mumbai and another one in Hyderabad. The cloud servers have seven levels of security, and even a TCS employee is not allowed to visit the server[5]. These employees will lose their jobs, if they found having any secondary storage devices, so there is no chance for data leakage and in any disaster or when a server goes down, another server can be used. B. Importance of security in cloud computing: When a business organization is approached to move to cloud computing, their first concern would be security. Because the organization may feel that the data should not be available to public and highly confidential. In this case its mandatory for every cloud provider to convince their customers that their data is secured. When a business organization feels that his data in the cloud may be prone to leakage to outer world and when he is not convinced with security in the cloud then they may feel hesitation towards cloud. C. Security in cloud computing: Mostly ABE-attribute based encryption is used for security in cloud. In attribute based encryption, only cipher data are saved in the cloud, and these cipher data needs suitable key to decrypt and access the original data. In this context, a few users would have authentication rights towards the cipher text stored in cloud, and to these users keys are given as attributes. The attributes will be provided to the end user by the owner of the data. III ACCESS MATRIX MODEL: The access matrix model is the most commonly used model for security issues in distributed computing, cloud computing which was
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) formulated by Harrison, Russo, and Ullman . This access matrix model is precisely called as HRU model. Here the subjects are represented in a row manner and the objects are represented along the column manner[4]. When P[S,O]=own , than the corresponding subject trying to access the object has the authorization over the file, i.e owner of the file. So the subject has the right to read, write and give new access rights to other objects. A. ACCESS CONTROL MATRIX SCENARIO: Usually security is provided in the cloud using access control matrix model. In access matrix model, when a subject wants to access a object (when a process wants to access a word document) already the access rights to access the subject by an corresponding object will be stored. So every time when a subject asks to access a object to open in any mode such as write or read, the access is not granted directly instead the table is checked first. After checking if [S,O] has stored the access right which is requested than the permission is given, otherwise a denied. But this access control method has many disadvantages in it. The three main disadvantages in the access control matrix method are reviewed and a solution is proposed for each one. 1. Negative request (request which is denied as response): The term negative request refers to the denial of access which is sent as reply when a request is made. Few user may not have access towards some files, but the user may be unaware of it or the user may just give a try to access the file. In this case a request is made and the cloud server will have to check for access rights for the requested file. After checking for the access rights, the server will deny the access of file for the user. In the mean time of the reply, the user will be waiting for the response which will be waste and can be avoided. The time spent by the server to check the access rights, the process done by the server can be avoided. Data hiding is given as a solution for this problem. 2. A user might have access right to access a file only in a particular mode such as read or write. In this case, when a user is requesting to access a file in a non accessible mode, to which he dont have a access right, than the service will be denied. So the computing made and the time spent in this case can be saved.
[S,O]= [rights of S on O]-Rights of subject on Object. [S,O] represents a single cell, were the subject S tries to access the object O, and right for S->O is stored in that particular cell. An access control matrix will have subjects, objects and commands. The command will give input from objects and will check with the corresponding subject, which the object refers too in the access control matrix to know the access right. The access right can be broadly classified as read, write and own. READ {[S,O]=read}: When P[S,O]=read, it means that a subject S wants to access the object O, the access right give is read and the subject can use the object only in read mode. WRITE {[S,O]=write}: When P[S,O]=write, it conveys that the subject S has the access right write to the requested object O. OWN {[S,O]=own}:
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) 3. In the third case, the request may get a positive response or a negative response. However, if a user x,wantst to access all the files placed in cloud by a particular owner y, and the owner y has placed thousands of files in the server, so that every time when x wants to access the files of y, his access rights will be checked. But this will result in increased computing and time. IV Our contribution To overcome the three demerits in the access control matrix mentioned above, to overcome the large amount of time taken and to avoid unwanted computing, we propose the following three ideas. A. DATA HIDING: Here a new methodology called data hiding is introduced. The data hiding used in oops differs from the data hiding going to be used in access control matrix. Consider a context where a object dont have access to the subject, but the Subject gives a request to access the object. So obviously the access right will be checked and denial will be sent as reply. In this case the time consumed for sending the request, computing and replying is unnecessary. So using data hiding we can hide the file from the subjects, who dont have access rights, so that efficiency can be improved. The end user no need to have confusion about to what files access is there and no need to wait for the files for which access will be denied. The cloud computing server can be saved from unnecessary processing of request and replying. In real time millions of requests are made for every second and for few million requests are denied access in response, and what if there are only request which would have a positive response and other requests are not made, and this method of hiding the files which users dont have access is referred as data hiding here. File 1 File 2 File 3 File 4 User 4 Read Read Read Own
Consider the above table for example. There are totally four files and four users who can request to access the file, so in total 16 requests can be made. But access will be granted only for 13 requests and will be denied for the remaining 5 requests. What is meant as data hiding here is to hide these 5 files from users who dont have access for it so the waiting time for user and computing time for server can be saved. When the user1 wants to access the file4, so the user1 will send a request to the cloud server to access the file and the access control matrix will be checked. After checking the cloud server will deny the request made from the user1, and in the mean time, the user will be waiting for the reply when processing is done in the server. So by hiding the file4 from the user1, the above process which is unnecessary to be done can be saved. Percentage of users who requests ) actually have access rights and made the request to get a positive response Total no of requests (100 * no of positive
Percentage of users who requests ) dont have access rights and made the request to get a negative response No of positive requests made = 11
(100 * no of negative
Total no of requests
Object Subject User 1 User 2 User 3
No of requests without access rights = 5 The percentage of positive requests made = 68.75% The percentage of positive requests made = 68.75% The percentage of negative requests made = 32.25% So if the requests which are surely going to fetch a negative reply are hidden, efficiency can be improved, waiting time of the user can be avoided.
Read Write Own
Write Own -
Own -
Read
PARTIAL REQUEST: A technique partial request can be used to improve the performance in access control matrix. In partial request, a user is allowed to make requests on a file, only in the mode on which the user is allowed to access. Consider a context, when a user wants to open a file in write mode, and the user requests to access the file in the write mode. In this case the server will deny the request made, and than the user will again send a request in read mode this time, for which he would get a positive response, so as a result of partial request a user is allowed to send request only in read mode directly. So the waiting time of the user, unnecessary time delay and computing made can be avoided as a result of partial request. In fig.2 the user1 can open the file1 only in read mode, and if request is made by the user1 to access the file in any other mode, the access will be denied. So in this case, if user1 requests the file1 to access in write mode, a request will sent to the server and processed, as a result access is denied. To prevent this, in partial request, the user1 cannot have a option to open the file1 in write mode, so he needs to send requests only in accessible modes. The user cannot request to access the files only in the accessible modes to which the user is allowed to access.
B.
C.
GROUPING DATA: Its a known fact that access matrix model is used in cloud computing for security, and as a result each and every time a file is accessed by a user, a request is sent to access the file and check is made in the access matrix, whether the user has access rights for the requested file. So it is obvious to minimize the checking made for access rights. Grouping the data can provide a solution for this problem. For example in a facebook environment, when a user has 250 friends, and user has security, were when a photo is posted only his friends can
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) view it. In this scenario if the user posts some 100 photos and all the photos are viewed by all the 250 friends of the user, than totally 25000 requests will be made. Each time the access rights will be checked and the permission will be given. So in grouping data, all the data which will have a positive access right towards a user will be grouped in to a single group, and the user will send a request to access the group and not requests for every single file and as a result time consumed, space consumed can be minimized and efficiency can be improved. In the above example of a user using a facebook, when all these 100 images are arranged into a single group, than the other friends of the user trying to access the photos will send requests and the access rights will be checked only once for accessing the So to overcome this key distribution centers was introduced in DACC(distributed access control in clouds). Only cipher text will be saved in the cloud, and the key distribution centre is responsible for providing the secret key to users. The key distribution centre is not only provided in a single location alone, but in multiple locations to avoid single point failure. Followed by attribute based encryption and providing keys using KDC-key distribution centers, immediately a new technique IBE-identity based encryption was proposed. In identity based encryption each user is allocated with a new unique id. In IDE the unique information about the user is used as the public key. Identity Based Encryption is similar to the Attribute Based Encryption. Here the key distribution centre is going to give the key, so the whole process is based on the KDC. But it is only a assumption that the Key Distribution Centre will be honest. So this is need to be rectified. The attribute based encryption uses a protocol called cryptoghraphic protocol. Here the problem is that when multiple users are needed to access a single record, than each user will should be given with the attribute for authentication through access control using secret key. Even though many will be willing to access a record, they are broadly categorized as, private and the public users. The private users are the people like the owner of the record, and the friends and fanily relations of the owner. The public users may be the cops, nurse, doctor. In case of a medical record, public users will be doctor, nurse, health administrator and the private users will be the friends and relations about whom the record is. There can be several possible public domains, and when more than one public domain needs to access a record, but the problem in this model is that, when a data is need to be decrypted, the domain to which the data belong to must be identified first and the KDC will provide the key. But there are separate KDC for each domain, and it is not clear that who the KDCs are. Then wang et al came up with a hierarchial model. The person at the root node is the major responsible for providing attribute to users and others at lower nodes. However this will result in a single point failure. Future work:
group. So by grouping data no need to send 100 requests and check the access rights each time. V. RELATED WORKS: Attribute based Encryption and decryption in cloud environment: Security is provided for the data stored in cloud using access matrix model[6]. But the disadvantage in this model is, every time the access right is to be checked and this will result in extreme storage and computation cost. So another method encryption is followed. Data to be stored in the cloud is encrypted with the attributes that belongs to the data to be stored and this encrypted data is alone securely stored in cloud and hence cloud is called as secured medium for storage. The end user with the exact key can decrypt and access the data. The authentication process is done using the public key cryptography. The end user who have the access right, will be allocated with a attribute to decrypt the data stored in cloud. A particular record stored in the cloud may be needed to access by different users. A medical record of a patient stored in a cloud may be needed to access by a doctor to view his health history, by a nurse to give medicines. In this case, each of them will be allocated with a attribute and key to decrypt. The disadvantage with this method is that each user will be given attribute based keys to decrypt which will result in huge storage cost.
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) Many organizations store their highly confidential data in the cloud and they expect the data to be secured, and cloud also provides high security to the data. But the time taken for the authentication is large. Access control matrix method is used for security in cloud com putting and when a single file needs to be authenticated, the time taken for authentication is tolerable but when small images, doc files etc needs to authenticated to access from cloud, every time the authentication is done. This results in waste of time and computation. Our proposal minimizes the unwanted computations needed. This can be still solved my using any method other than access control method. We leave this as a future work. VI. Conclusion: For security in cloud computing the methodology used is access control matrix. The access control matrix provides good and simple security, but the problem with this model is time taken for every file to check for the access rights. So we proposed three new techniques called data hiding, partial request and data grouping. By these three techniques time taken for unwanted requests and computations made can be minimized by eliminating the number of unwanted requests made. References [1] Shuai Zhang, Shufen Zhang, Xubien Chen, Xiuzhen Huo, Cloud Computing Research and Development Trend, second International conference on future networks, 2010. [2]
http://www.techzost.com/2011_12_02_archive.html
[5]
http://www.tcs.com/offerings/ion-small-medium-businessSMB/Pages/default.aspx
[6] Sushmita Ruj, Amiya Nayak, Ivan Stojmenovic, DACC: Distributed Access Control in Clouds, 2011 International joint conference of IEEETrust Com-11/IEEE ICESS-11/FCST-11.
[3] Xe Jing, Zhang Jian-Jun, A Brief Survey on the Security model of Cloud computing 2010, Ninth International Symposium on Distributed Computing and Applications to Business, Engineering and Science.
[4] Mukesh Singhal, Niranjan G.Shivaratri, Advanced concepts in operating systems distributed, database, and multiprocessor operating systems.
Analytical Investigation of Thermo Actuator and simulation of MEMS resonator

K.Malarvizhi , T.Sripriya , J.Jayalakshmi 1,2,3 Saveetha Engineering College, Thandalam, Chennai
1 2 3
Abstract
A microwave device can be used as either actuator or sensor. An electrostatic thermal actuator, for instance, can function as an actuator when is supplied with electric energy and sets into motion a micro device, but, equally, it can be used as a sensor in a micro device that is actuated by a different source in order to measure displacements by quantifying electric capacitance changes. However, the relationship between mechanical displacement and capacitance variation is unique. In other words, a unique equation governs a specific transduction form, which can be used conveniently to describe either actuation or sensing, through calculation of the corresponding output amount in terms of the input quantities. One of the thermo actuator is simulated, and for the same thermo actuator Natural frequencies are found out. KeywordsMEMS, sensor,thermo actuator, 1. INTRODUCTION MEMS, the acronym of Micro Electro Mechanical Systems are generally considered as micro systems consisting of micro mechanical sensors, actuators and microelectronic circuits. As microelectronics is a welldeveloped technology, the research and development of micro mechanical sensors and actuators, or micro mechanical transducers. Or, we can say that micro mechanical sensors and actuators are the basic devices for MEMS (note that the word transducer is often used as a synonym of sensor. However, it is sometimes read as sensors and actuators.) [9]. mechanical displacement, improved performance and increased complexity of MEMS devices, there is a corresponding need for automated design and optimization methodologies that allow a designer to explore more of the potential design space. Already automated design systems based on evolutionary algorithms (EAs) have been used for antennas, flywheels, load cells, trusses, robots, and more. A MEMS chip is shown in figure 1. [10] 2. MEMS RESONATOR A history of unrealized promise the technology with the most potential to replace quartz crystals has long been the micromechanical resonator. Micro Electro Mechanical Systems (MEMS) resonators are important signal processing elements in communication systems. Over the past decade, there has been substantial progress in developing new types of miniaturized MEMS resonators using micro fabrication processes. It is essential that MEMS resonators have a high quality factor (Q-factor) in manifold applications [2]. Finite element method (FEM) can be used to determine resonance frequency; this technique, however, remains limited in estimating the global Q-factor. The reason lies in the difficulty of modeling the relevant loss mechanisms. Indeed, for high frequency resonators, both acoustic radiation into the substrate (anchor loss) and thermo elastic dissipation have key roles in determining energy. [3]
Fig 1.The Schematic View of MEMS Chip Microelectromechanical systems (MEMS) are devices that generally range from 20 micrometers to 1 millimeter. These micro-sized devices are of great interest in the aerospace community because of their small size and high reliability. Along with the desire for
Organized by: Department of Computer Science and Engineering, Anand Institute of Higher Technology, Chennai. www.iconic12.in E-mail: iconic.aiht@gmail.com[Type text] Page 379
Thermal actuation has been extensively employed in MEMS. It includes a broad spectrum of principles such as thermal pneumatic, shape memory alloy (SMA) effect, bimetal effect, mechanical thermal expansion, etc. The thermal pneumatic micro actuator uses thermal expansion of a gas or liquid or the phase change between liquid and gas to create the actuation.
Fig.2 MEMS Resonator Deposited On A Half-Space Substrate. The invention relates to a MEMS resonator having at least one mode shape comprising: a substrate having a surface and a resonator structure wherein the resonator structure is part of the substrate characterized in that the resonator structure is defined by a first closed trench and a second closed trench the first trench being located inside the second trench so as to form a tube structure inside the substrate and the resonator structure being released from the substrate only in directions parallel to the surface. MEMS resonator structure is shown in figure 2. The invention further relates to a method of manufacturing such a MEMS resonator. [4] 3. MEMS Thermal Actuator and Resonator A MEMS thermal actuator is a micromechanical device that typically generates motion by thermal expansion amplification. A small amount of thermal expansion of one part of the device translates to a large amount of deflection of the overall device 4-5]. Usually fabricated out of doped Single Crystal Silicon or Poly silicon as a complex compliant member, the increase in temperature can be achieved internally by electrical resistive heating or externally by a heat source capable of locally introducing heat. 3.1 Types of Thermal Actuators Asymmetric (bimorph) Symmetric (bent beam, chevron) Electrostatic - parallel plate or comb drive Magnetic 3.2 Thermal Actuators Fig.3. Thermal Bimetallic Micro actuator with the Cantilever Prototype Figure 3 shows a cantilever bimetallic structure. When it is heated, a deflection is generated by the different thermal expansion between the two materials. The more different the two materials thermal expansion coefficients, the more deflection are generated. The thermal bimetallic micro actuator with the cantilever prototype mechanical thermal expansion micro actuator is similar to that of the bimetallic micro actuator. The only difference is that the mechanical thermal expansion micro actuators are made of the same material. Thermal actuators can generate relatively large force and displacement at low actuating voltage. The deflection can linearly increase as the control voltage is increased within a large13range. Mechanical thermal expansion actuator and bimetallic actuator also can be integrated in a chip easily. However the high power consumption and low switching frequency are concerns for applications of thermal actuators. 3.3 MEMS Thermal Actuator The thermal actuator is fabricated from poly silicon and is shown in figure 5.
Fig.5. Thermal Actuator
The thermal actuator works on the basis of a differential thermal expansion between the thin arm and blade. The required analysis is a coupled-field multi physics analysis that accounts for the interaction (coupling) between thermal, electric, and structural fields. A potential difference applied across the electrical connection pads induces a current to flow through the arm and blade. The current flow and the resistivity of the poly silicon produce Joule heating (I2R) in the arm blade. The Joule heating causes the arm and the blade to heat up. Temperatures in the range of 700 -1300 oK are generated. These temperatures produce thermal strain and thermally induced deflection. The resistance in the thin arm is greater than the resistance in the blade. Therefore the thin arm heats up more than the blade which causes the actuator to bend towards the blade. The maximum deformation occurs at the actuator tip. The amount of tip deflection (or force applied if the tip is restrained) is a direct function of the applied potential difference. Therefore, the amount of tip deflection (or applied force) can be accurately calibrated as a function of applied voltage. These thermal actuators are used to move micro devices, such as ratchets and gear trains. Arrays of thermal actuators can be connected together at their blade tips to multiply the effective force. The main objective of the analysis is to compute the blade tip deflection for an applied potential difference across the electrical connection pads. Additional objectives are to: Obtain temperature, voltage, and displacement plots Animate displacement results Determine total current and heat flow. 4. Finite Element Method Finite element methods are now widely used to solve structural, fluid, and multi physics problems numerically [1]. The methods are used extensively because engineers and scientists can mathematically model and numerically solve very complex problems. The analyses in engineering are performed to assess designs, and the analyses in the various scientific fields are carried out largely to obtain insight into and ideally to predict natural phenomena. The prediction of how a design will perform and whether and how a natural phenomenon will occur is of much value: Designs can be made safer and more cost effective, while insight into and the prediction of nature
can help, for example, to prevent disasters. Thus, the use of the finite element method greatly enriches our lives. The finite element method (FEM) (its practical application often known as finite element analysis (FEA)) is a numerical technique for finding approximate solutions of partial differential equations (PDE) as well as integral equations. The solution approach is based either on eliminating the differential equation completely (steady state problems), or rendering the PDE into an approximating system of ordinary differential equations, which are then numerically integrated using standard techniques such as Euler's method, Runge-Kutta, etc. 4.1 Elements Used Tetrahedral Coupled-Field Solid element. This is a 10-node tetrahedral version of the 8-node element. The element has a quadratic displacement behavior and is well suited to model irregular meshes (such as produced from various CAD/CAM systems). When used in structural and piezoelectric analyses, SOLID98 has large deflection and stress stiffening capabilities.
Fig.4.Tetrahedral Coupled-Field Solid element The element is defined by ten nodes with up to six degrees of freedom at each. For modeling and analysis the materials used is Poly Silicon with Young modulus E = 169e3 MPa, Poission ratio of =0.22, Coefficient of thermal expansion x =2.9e-6/O k, Thermal conductivity of K =150e6 pW/mK , Resistivity R =2.3e-11 m, and a Density of 2330kg/m3 [5]. 5. Modeling of Thermo Actuator
The actuator is modeled using CATIA software. The dimensions are shown in figure 6 and 7. The actuator modeled in Catia software is imported in Ansys software and meshed with the element and static as well as dynamic analysis are carried out and results are shown from figure 8-10. The Voltage distribution and temperature distribution of an actuator is shown in figure 8 and figure 9 respectively [7-9]. Fig.7. Bottom View of the Actuator
Fig.5. Orthographic Projection of an Actuator All dimensions are in microns. Based on the above dimensions actuator was model in Catia software which is shown in fig (8) &fig (9).
Fig.8.The Voltage distribution of an actuator Figure 10 shows the displacement plot of an actuator.
Fig.6.Front View of the Actuator
Fig.10.The displacement plot of an actuator
Table.1The Natural Frequencies of a resonator: SET 1 2 3 4 5 FREQUENCY KHZ 35.542 77.006 313.30 421.07 462.73 Fig.12.The second mode of an actuator 5.Analytical validation for the static analysis of thermo actuator Two beam thermal actuator A two beam thermal actuator is sketched in fig.consisting of two parallel beams of dissimilar lengths and a connecting rigid link. When the longer beam is heated its free thermal expansion is prevented by the linked connecting link as well as by its own fixed end. The resulting deformed shape which is sketched in fig is produced by bending of both the thermally active (long) beam and the passive (short) one. The geometric parameters defining this actuator system is given in fig. The result of prevented thermal expansion can be model by a force F2t. which is applied parallel to the heated beam at its end. And which is given in Eq. as mentioned previously the free displacement and the bloc force fully defined the performance of a thermally driven fixed free bar. As a consequence the two beam thermal actuator can be qualified by the free displacement at point 2 about the y-direction uo=u2y, and the bloc force Fb =F2y, that needs to be applied at the same point in order to completely bloc force the actuator. Another qualifier that can be used to supplement characterizing this actuator is the free rotation being produced through heating at the same point 2. In order to achieve either of these tasks. It is necessary to find the unknown reactions at the fixed end. Flx, Fly and Mlz. As previously done, they are found by solving the corresponding equations showing that the two translations and the rotation are
The table 1 shows the natural frequencies of a resonator which can be set during the analysis. The first natural frequency and its mode shape, bending mode along the y axis is observed and is shown in figure 11
Fig.11. The First mode of an actuator The second natural frequency and its corresponding mode shape, bending mode along the z direction is observed and it is shown in figure 12.
zero at that point. The respective displacements can be formulated by considering the strain energy stored in the two beams through bending and axial effects, and by applying castiglianos displacement theorem. After determining the three reactions as functions of F2t and the systems geometry, the rotation at point 2 can be found similarly, and its equation is: U0 = u2y=Al21l2l4 l1l43+2(l3+l4)l43+l13(2l3+l4)]T {2[Al1l22l4(l13+l43)+IZ(l1+l4)(l14+4l13l46l12l42+4l1l43+l44)]} [/
effect relationship utilized equations presented thus far.
in
the
actuation-type
The bloc force can be found by expressing first U2y as a function of F2y and then taking U2y=0. This gives the bloc force as: It is interesting to study how the length parameters l1, l2 and l4 influence the performance of the two -beam thermal actuator, for instance the free displacement of equation, as discussed in the following example. Problem; Analyze the free displacement of a two beam actuator by expressing l2 and l4 as fraction for the length l1. The following geometric and material values are known: l1 = 200m, w=t= 1m, T= 40o, =1.1*10-6 1/o, E = 130 GPa. Solution; Considering that l3=l1-l4 (see fig 4.14) and that the short lengths l2 and l4 are fractions of the long beams length l1, namely: l2=c2l1 l4=c4l1 The free displacement is plotted in terms of c2 and c4, as shown in fig. it can be seen that the free displacement is larger when both the short beam and the short connecting link are small relative to the active beam length l1 and that u0 depends non linearly on the coefficient c2 and quasi-linearly on the coefficient c4 of equ. The thermal micro actuators that have been studied here can also function as sensors in the sense that they can be placed in an environment where thermal changes are expected. The amount of mechanical deformation being produced through thermal variation, which can be evaluated experimentally, will furnish the corresponding amount of temperature change by reversal of the cause-
l1=200x10-6m; b=1x10-6m;t=1x10-6m; l2=cl1 l3=l1-l4 l4=dl1 C=0.1to0.5 D=0.1to1 A=1x10-12m2 =1.1x10-6/oc T=40oc I=bt3/12=8.33x10-24m4
Result for displacement
Fig.13.Validation Of Two Beam Thermal Actuator Result Obtained By FEM l1=210x10-6m; b=19x10-6m;t=2x10-6m; l2=19x10-6m l3=160x10-6m l4=50x10-6m A=38x10-12m2 =2.9x10-6/oc T=905oc I=bt3/12=12.66x10-24m4 The obtained displacement u=5.33x10-6m(By analytical) The displacement obtained through FEM=3.076x10-6m Thus the result obtained through FEM is validated. CONCLUSION The thermo actuator (static) analysis was performed the displacement at the tip of the actuator is found. The displacement result of thermo actuator also done by
analytical method. It is found that both the results are agreed well. The dynamic analysis was performed for thermo actuator to find out natural frequency and mode shape. References [1] Didace Ekeom, Thermal-Electromechanical Fem Bem Model for Mems Resonator Simulation, Microelectromechanical Systems, Vol. 20, No. 1, February 2011. [2] JaroslavMackerleSmart Materials And Structures: Fem And Bem Simulations A Bibliography Elements In Analysis And Design 37 (2001) 71-83. [3] R.M.C. Mestroma,R.H.B. Fey, J.T.M. Van Beekb, K.L. Phan, And H. Nijmeijer AModellingThe Dynamics Of A MEMS Resonator: Simulations And Experiments , Sensors And Actuators A 142 (2008) 306315. [4] R.M.C. Mestroma,R.H.B. Fey, K.L. Phanc, And H. Nijmeijer Simulations And Experiments Of Hardening And Softening Resonances In AClampedBeam MEMS Resonator, Sensors And Actuators A 162 (2010) 225234. [5] Kazuaki Tanaka, Ryuji Kihara, Ana SaNchezAmores,Josep Montserrat, And JaumeEsteve Parasitic Effect On Silicon MEMS Resonator Model Parameters, Engineering 84 (2007) 13631368. [6] Zachary J. Davis *, Winnie Svendsen, AnjaBoisen, Design, Microelectronic (1997-1999), Finite Journal Of
Fabrication And Testing Of A Novel MEMS Resonator for Mass Sensing Applications, Microelectronic Engineering 84 (2007) 1601 1605. [7] Gianluca Piazza, Philip J. Stephanou, And Albert P. Pisano, One And Two Port Piezoelectric Higher Order Contour-Mode MEMS Resonators For Mechanical Signal Processing, SolidState Electronics 51 (2007) 15961608. [8] F. Shi,P. Ramesh And S. Mukherjeet, Simulation Methods For MicroElectro-Mechanical Structures (Mems) With Application To A Microtweezer, Computers &Srrucrurss Vol. 56, No. 5. Pp. 769-783. 1995. [9] Mechanics of Microelectromechanical SystemsBook. Nicolae Lobontiu and Ephrahim Garcia. Kluwer Academic Publishers. London2005. [10] Analysis and Design Principles of Mems DevicesBook. MinhangBao. Elsevier B.V,Netherland, 2005.
OVERVIEW OF HONEYPOTS AND BOTNET DETECTION METHODS

A.Gnanasundari1, A.Anitha2, B.Chithra 3, Prof.Gowri4
1, 2, 3
PG Students, CSE, Sri Manakula Vinayagar Engineering College, Puducherry, 4 Professor , Sri Manakula Vinayagar Engineering College, Puducherry
Abstract
Internet risks are increasing rapidly. Intrusion detection and prevention are some of the tools that is use to test if a system in a network as been infected with an intruder. Like viruses and worms a bot is a self-propagating application that infects vulnerable hosts through exploit activities to expand their reach. Botnet detection is an interesting research topic related to cyber-threat and cyber-crime prevention. Honeypot is a technology that is used to detect, prevent, give response to an attack. This paper classifies botnet detection approaches into four classes: signature-based, anomaly-based, DNS-based, and mining-based. Honeypot use minimum resources so that anyone can use honeypot to safeguard their important information. Honeypot are generally used be the system. Aiming to this system for the data ,we present an overview to the honeypot technology and the possible method of implementation an honeypot ,their advantage as well as disadvantages that are involved with the honeypot technology. Index Terms. Honeypot, botnet I. INTRODUCTION Intrusion is an activity that inflect computer or an network or even the server with an illegal activity. Intrusion detection and prevention system helps us to determine if a system access a Burglar alarm by which an operator of the computer is want with the illegal activity. The operator now tags this incident. This particular tag is sent to the incident handling team for further proceeding .Once hackers are tracked they are prevented so that system is safeguarded from the beginning II. HONEYPOTS ORGANIZATION firewall in a honeypot actually work in the opposite way a that a normal firewall works, instead of restricting what comes into a system sends back out. A honeypot serves several function when an intruder attacks a system. An administer can watch the hacker exploiting the system vuenersblities. From this they can learn were the system has its weakness that has to be redesigned. By understanding the activates of hacking, designers can create more secure system that are in vulnerable to future hackers .Hacker can be caught and stopped while buying to obtain root access to the system. III. HONEYPOT IMPLEMENTATION Review Stage Honeypot are easy to install, but can be complex at types. It depends on the honeypot being installed, there are two types of honeypot one is low-interaction honeypot another type is high-interaction honeypot . Low-interaction honeypot are easy to install ,they are a solution that emulates operating system and services . Low interaction honeypot require user to install and configure software on a computer. Example of low interaction honeypot is Honeyed. Honeyed which used by home users and small business users. High interaction honeypot are not easy to installed. A high interaction honeypot should be used for more advanced users of honeypot. They are more complex for the home or small business users to install. To install high interaction honeypot it requires
Honey pot serve as an excellent technology that helps providing security for the internet users against hackers. Honeypot is a server that is attached to an internet and it act as a distracting decoy that helps monitoring the activity as they break into a system. Honeypots limit hackers accessing an entire network, with an honeypot, hackers will not even know that their being monitor. Mostly honeypot are installed inside firewalls. Mostly honeypot are installed inside firewalls so that they can be utilized and controlled better, but it is also possible that honeypot are also instilled from outside a firewall. A
configuration of the entire network that are design to attack .An example of high interaction honeypot would be Honeynet. Both low and high interaction honeypot attack hackers. RISK INVOLVED WITH HONEYPOT If you dont monitor your honeypot and honeynet you can run the risk of it roaming honeypot can cause a hacker to attack your honeypot from outside your network. IV. BOTNET AND BOTNET DETECTION
demonstrate very strong synchronization in their responses and activities B. DNS-BASED DETECTION This technique is based on particular DNS information that is generated by a botnet. DNS is similar to that of the anomaly detection algorithms. To access the C&C server bots perform DNS queries inorder to locate the respective C&C serer that is hosted by a DNS traffic by DNS monitoring and thus detect DNS traffic anomalies. Mechanisms are also available to identify botnet C&C server by detecting domain name with abnormally high or temporally concentrated DNS query rates. An approach was proposed by Schonewille and Van Helmond in 2006 which was based on abnormally recurring NXDOMAIN reply rates. This approach is very effective to detect several suspicious domain names and there may be less false positive because NXDOMAIN replies are more likely to refer to DDNS than to other names. C. MINING BASED DETECTION Anomaly-based techniques are not useful to identify botnet C&C traffic. One effective technique for botnet detection is to identify botnet C&C traffic but botnet C&C traffic is difficult to detect. Geobl and Holz proposed Rishi in 2007. Rishi is mainly based on passive traffic monitoring for unusual or suspicious IRC nicknames, IRC servers, and uncommon server ports. They use n-gram analysis and a scoring system to detect bots that use uncommon communication channels, which are commonly not detected by classical intrusion detection systems. Botminer approach which applies data mining techniques for botnet C&C traffic detection. Botminer is an improvement of Botsniffer. It clusters similar communication traffic and similar malicious traffic. Then, it performs cross cluster correlation to identify the hosts that share both similar communication patterns and similar malicious activity patterns. Botminer is an advanced botnet detection tool which is independent of
Malicious botnet is a network of compromised computers called bots under remote control of human operators called botmaster. Bots are software program that run on host computer. The highlight value is to provide anonymity by the use of multi-tier command and control (C&C) architecture. Snort is an open source Intrusion Detection System (IDS). Similar to other intrusion detection system Snort is also configured ith a set of rules or signature that is used to log traffic which is deemed suspicious. It is used for detection of known botnets and not unknown botnets.
Figure 1 :A Typical Botnet Architecture A. ANOMALY-BASED BOTNET DETECTION Anomaly based botnet detection technique is based on several network traffic anomalies such as high network latency, high values of traffic , unusual ports that indicates the presence of malicious bots in the network. Several Algorithms are used to detect botnets that are encrypted and botnets C&C channel in a local area network. For example Botsniffer is based on the observation of bots within the same botnet that will
botnet protocol and structure. Botminer can detect realworld botnets including IRC-based, HTTP-based, and P2P botnets with a very low false positive rate
In overall, these techniques can detect real world botnets regardless of botnet protocol and structure with a very low false positive rate. REFERENCES [1] B. Saha and A, Gairola, Botnet: An overview, CERT-In WhitePaperCIWP-2005-05, 2005. [2] M. Rajab, J. Zarfoss, F. Monrose, and A. Terzis, A multifaceted approach to understanding the botnet phenomenon, in Proc. 6th ACM SIGCOMM Conference on Internet Measurement (IMC06), 2006, pp. 4152. [3] N. Ianelli, A. Hackworth, Botnets as a vehicle for online crime, CERT Request for Comments (RFC) 1700, December 2005. [4] Honeynet Project and Research Alliance. Know your enemy: Tracking Botnets, March 2005. See http://www. honeynet.org/papers/bots/. [5] G. Schaffer, Worms and Viruses and Botnets, Oh My! : Rational Responses to Emerging Internet Threats, IEEE Security & Privacy, 2006. [6] E. Cooke, F. Jahanian, and D. McPherson, The zombie roundup: Understanding, detecting, and disrupting botnets, in Proc. Steps to Reducing Unwanted Traffic on the Internet Workshop (SRUTI05), 2005, pp. 39-44. [7] A. Ramachandran and N. Feamster, Understanding the network-level behavior of spammers, in Proc. ACM SIGCOMM, 2006. [8] Z. Zhu, G. Lu, Y. Chen, Z. J. Fu, P.Roberts, K. Han, "Botnet Research Survey," in Proc. 32nd Annual IEEE International Conference on Computer Software and Applications (COMPSAC '08), 2008, pp.967-
D. COMPARISON OF BOTNET DETECTION TECHNIQUES This section shows a comparison of various botnet techniques. The key features includes: ability to detect unknown bots, capability of botnet detection regardless of botnet protocol and structure, and botnets with encrypted C&C channels, real-time detection, and accuracy TABLE 1. COMPARISON TECHNIQUES OF BOTNET DETECTION
There are few botnet techniques that can detect botnet regardless of botnet protocol and structure. These techniques will be effective even though botmasters change their C&C communication protocol and structure. On the other hand, detection techniques that require access to C&C payloads are less effective as botmasters tend to use encrypted channels for C&C communications.
972. [9] J. B. Grizzard, V. Sharma, C. Nunnery, B. B. Kang, and D. Dagon, Peer-to-peer botnets: Overview and case study, in Proc. 1st Workshop on Hot Topics in understanding Botnets, 2007. [10] P. Wang, S. Sparks, and C. C. Zou, An advanced hybrid peer-to-peer botnet, in Proc. 1st Workshop on Hot Topics in understanding Botnets, 2007.
LDPC Decoding algorithm for predicting and screening Genome sequences

K.Padmapriya1 A.Anand2 Dr.P.Senthil Kumar3
1
P.G student Department of ECE,Saveetha Engineering College

2
Assistant Professor, Saveetha Engineering College

3
Professor, Saveetha Engineering College
Abstract
From the statistical report of medical field is, now many diseases are based on genetic disorder and problems. Hence Genome sequence research field concentrated to find a solution for gene sequence analysis and gene function prediction. Now Signal processing field invade this domain and make ease to get optimized solution for the Genetic problem. The proposed work is predicted the Genome sequence screening with the help of Forward Error Correction code from information coding theory and Gene function prediction with best Classifiers from Neural network concepts. Already Extended min sum product algorithm (EMS) is used to decode the LDPC (Low density parity check code) and is proven its BER performance near to Shannon limit. On other hand, Extreme learning machine (ELM) and SOM (Self organizing map) classifiers in neural network plays vital role in pattern recognition with high speed and less number neurons. So the integration of the two algorithms going to revolute in Gene function prediction and sequence detection. The proposed work is exploited the EMS algorithm efficiency to decode the Genetic ATPG information. DNA Group testing is related analogously to the stochastic Hopfield network with SOM and performance parameters are attributed which will reduce error in Gene state prediction form DNA library with optimized Neuron. Moreover the work is projected on the lower bounds of DNA word set, results improvement of stability of reliable DNA sequence detection and function prediction from pooling experiment. Keywords: DNA sequencing and prediction, extended min sum algorithm, LDPC, Self organizing map, ELM, Dynamic GA. Introduction Gene sequence detection with less error rate by Basien network pool decoder method done by Hiroaki[1]. Bio informatics deals the ATCG sequence analyzing by two methods (1). Sequencing based on Context and (2) Finite clone based detection. The proposed algorithm is integrated the field of Channel coding theory and Advanced neural network. Hence the communication channel codes such as LDPC decoding algorithm used to detect the sequence of DNA. To predict the Genome function the same above algorithm modified and flourished by stochastic Hopfield network with dynamic threshold algorithm. Screening of Gene sequence from the pool of library by Basien stochastic process with LDPC [1] described. The proposed algorithm mainly focuses the decoding of gene state by LDPC and prediction of genome state by artificial neural network with dynamic genetic algorithm. In these strategies all the genome data are collected from various sources such [11] and [12].The proposed algorithm simultaneously detects and predicts genome sequence detection and prediction by efficient method and supported the point by the simulation results. Unit I introduces the various genome sequence detection. II describes the gene state prediction III explores the idea of LDPC and its decoding algorithm with Hopfield network comparative analysis V explains the merits of proposed algorithm by simulation results and VI will conclude the proposed methodology of gene sequence detection and prediction by LDPC. I. Genome sequence detection A DNA sequence can be considered as an ordered symbolic sequence of four alphabets, namely A, T, C, and G, which are four types of nucleotides. In order to apply signal processing methods to analyze the DNA sequence, the symbolic sequence should first be transformed into a digital signal by some kinds of symbol-to-number mapping. One basic mapping is by
indicator functions, where the symbolic DNA sequence of four alphabets is represented by a four-dimensional vector signal. Each symbol is represented by a vector I = [IA IT IC IG]T , where IA = 1 if the symbol is A and 0 for all other cases, and IT , IC, and IG are defined in the same way. Therefore, for example, the symbol A is represented by [1 0 0 0] T. By this mapping, an example sequence ATGGAGAAAG is transformed into the following Table-01 . A IA 1 IT IC 0 0 T 0 1 0 0 G 0 0 0 1 G 1 0 0 1 A 0 0 0 0 G 1 0 0 1 A 1 0 0 0 A 1 0 0 0 A 1 0 0 0 G 0 0 0 1
(4)
A T P G information are encode by above table I and is transformed to Fourier domain and the results are verified using matched filter[2] but the proposed work is going use the table I as data matrix for LDPC and polynomials are generated by Galois number systems. LDPC is already proven its BER performance near to the Shannon limit[3]. Realization perspective also LDPC has good scope in current VLSI implementation. Hence the LDPC code is suggested to the work for detect the DNA sequence analysis. The simulated results discussed in unit V. II Genome function Prediction True path Rule(TPR) ensemble predict the genome function[4], it based on Gene Ontology and Fun Cat Taxonomies. That is characterized by a two-way asymmetric flow of information that traverses the graphstructured ensemble. Positive predictions for a node influence in a recursive way its ancestors, while negative predictions influence its offspring but the method [4] has similarities in Bi-partite graph in LDPC variable and check node updating is found and used to prediction also. In TPR prediction of gene function by changing the weighting vector with probability function of tandem repeats, is given by. Some time interspecies gene function prediction has the most important to analysis in terms of protein that is done by [5] t-test statistical analysis of Fly network with Yeast network. Then support vector machine also played to predict the nature of Gene function. SVM with Iterative Reduced forward selection (IRFS) carry out the feature selection before the selection feature subset [6]. SVM-RFS ranking method is determined by the weight of SVM. The weight defined as
IG 0
We notice that the 4-dimensional vector signal has only 3 degrees of freedom since for each symbol we have the equation IA + IT + IC + IG = 1 (1)
The representation of DNA sequence by indicator functions keeps all information of the sequence. There are some other ways for symbol-to-number mapping. One other choice is to map the sequence to an onedimensional signal, which is simpler though we may lose some information of the original DNA sequence. Andrejz[2] introduces a new approach to detect the aligbment of DNA sequenece by calculating cross correlation of DNA sequence under Fourier domain. That domain is periodically comb shifted by the amount of S. Let N,P Z, then (2)
(5)
(3)
Above weight update Eqn(5) based on number of selected feature M which is not giving better convergence to the prediction of genome function.
Hence to choose best result iteratively takes many steps and time of execution. So the proposed method prediction is done by ELM(Extreme learning machine) instead of SVM. III ELM algorithm ELM has the vast advantages in prediction of random data analysis and is proven[4]. Main scope to chose ELM is , less number of neurons is required to completed larger number of data analysis. A Basic of ELM is to train an SLFN is simply equivalent to finding a least-squares solution of the linear system.
Fig(1) Tanner graph for EMS For each check node i, messages , corresponding to
all variable nodes j that participate in a particular paritycheck equation, [4]are computed according to:
(6)
Here some essence of ELM is listed 1. The hidden layer of ELM need not be iteratively tuned 2. According to feedforward neural network theory [7], both the training error and the norm of weights need to be minimized [8, 9]. 3. The hidden layer feature mapping need to satisfy the universal approximation condition [10]. But natural data matrix need not to be a square matrix, so inverse of matrix is again problem, that is eliminated by the smallest norm least square solution of above linear sytem, (8) Where is the Moore-Penrose generalized matrix. Hence the number of neurons required to process is less with compared to SVM and Back propagation algorithm. IV LDPC Decoding Algorithm
(9) where N (i) is the set of all variable nodes from parity check equation i, (10) The APP messages in the current horizontal layer are updated by: L (pj) = L (pij) + Mij. V nm Variable node largest values of messages at the Input of check node Table-2 Vp(i)v Uvp(i) Vcp(i) Up(i)c set of messages entering into a variable node v output messages of variable node input messages of check node output messages of check node (11)
dv dc
degree of variable node degree of check node
To support the proposed work, simulation results are discussed in the following unit. VI Simulation results The Genome repository at the NCBI contains more interesting information about the human mitochondrial genome. The consensus sequence of the human mitochondria genome has accession number NC_001807. The mitochondria genome sequence collected from bank and NC_001807 is chosen for the experiment of detection and prediction of sequence and function respectively. The length of the ATCG sequence divided to give the SOM based LDPC decoder as {15}. The optimal subset sequence is calculated by Eqn(5) with SOM approach. SVM RFS the weight vector is updated by 200 iteration with maximum epoch rate is shown is fig(2)
And the result of extended min sum algorithm (EMS) is
(12) The described algorithm is generalized decoding algorithm of LDPC by EMS method. V Proposed Algorithm I,II,III and IV gives the idea of existing algorithm of Genome sequence detection, function prediction, ELM and LDPC decoding algorithm respectively. Eqn(2) and (3) shows that data are under Fourier domain. But the proposed algorithm data are under Galois field with logdomain which will convert all multiplication is simple addition manner. Eqn(2) convert convolution as multiplication and the GF with log domain convert multiplication into a addition because of logarithm properties.
Fig(2) Genome sequence Weight updation by SOM (13) So Eqn(13) shows algorithmic implementation advantages. simplicity and ATCG information are scattered in 4 different colors in the plot and the 2 dimensional weight vectors are updated to find subset of DNA sequence for analysis.
Tanner graph of the LDPC is analogously modeled as stochastic Hopfield network and its updating hidden layer and input layer is equal to Check node update Eqn(9) and variable node update equation. Hence inputs are random and preprocessed by SVM-RFE with optimal number of iteration, the optimal number of subset sequence selection in genome is done by Extreme learning machine weight update Eqn(7) and convergence of subset selection done in fast manner ,which will reduced the number of iteration to the sequence prediction part. DNA sequence detection is also the contribution of EMS algorithm with Dynamic genetic algorithm. Hence the Genome sequence detection and prediction efficiently by the LDPC decoding algorithm. Organized by: Department of Computer Science and Engineering, Anand Institute of Higher Technology, Chennai. www.iconic12.in E-mail: iconic.aiht@gmail.com[Type text] Page 392
Fig(3) Nucleotide identity and AT & CG sequence detection from 16571 sample Bank of human mitochondria NC_001807 has 16571 information of DNA sequence, that is ploted in fig(3) and for the place constraint combination of AT and CG sequences are only shown by the simulation. Genome sequence prediction part by ELM algorithm is shown in heat map format fig(4)
alignment, IEEE No.06,June 2006
Trans.
Signal
proc.
Vol
54,
[3]Adrian Vocila, David Declarcq, Low complexity, Low memory EMS algorithm for non-binary LDPC codes,IEEE Trans.Commn. Dec 2007. [4]Gaung-Bin-Huan, Dian Hui Wang, Extreme learning machines: a survey , Int.Journ. machine learning & Cyber 2001 [5]Antonia Mitofanova, Vladimit Pavlovic, Prediction of Protein Functions with Gene ontology and interspecies protein Homology datat, IEEE/ACM Trans. Comp. Biology&Bio informatics vol.08,No.3 June 2011. [6]Yanjioao Ren, Deping Wang, Prediction of DiseaseResistant Gene Rice based on SVM-RFE, Int.Conf. BME(2010) [7] Huang G-B, Zhu Q-Y, Siew C-K (2004), Extreme learningmachine: a new learning scheme of feedforward neural networks.
Fig(4) Codons for Gene frame 1 and reverse frame is down to the plot. There are totally 167 frame analysis fig(4) shows the fist frame with its reverse frame for correlation of genome state function prediction depends on the neighborhood values the sequences are predicted and plotted. VII Conclusion From the unit VI simulation results supported the proposed work function and its advantages. [1],[2] genome sequence detection give the result, but identification on subset gene sequence is not deterministic, but the proposed work subset selection done through SOM and detection is done efficiently done is ELM[4] with LDPC EMS[3] decoding algorithm. Hence the proposed work is efficient in both algorithmic and implementation aspects. References [1].Hiroaki Uehara, Masakazu Jimbo, A Positive detecting code and its decoding algorithm for DNA library screening, IEEE/ACM Trans.comp.biology and Bioinfromatics vol.06, no.04 december 2009 [2]Andrzej K. Brozik, Phase-only filtering for the Masses(of DNA Data): A New approrach to sequence
In: Proceedings of international joint conference on neural networks (IJCNN2004), vol 2, Budapest, Hungary, 2529 July 2004, pp 985990 [8]HuangG-B,ZhuQ-Y,SiewC-K(2006), Extreme learning machine :theory and applications. Neuro computing 70:489501 [9] Huang G-B, Chen L, Siew C-K (2006), Universal approximation using incremental constructive feed forward networks with random hidden nodes. IEEE Trans Neural Network 17(4):879892 [10] Huang G-B, Chen L (2008), Enhanced random searchbased incremental extreme learningmachine.Neurocomp71:34603468 [11] http://www.ncbi.nlm.nih.gov/geo [12]http://www.switchtoi.com/annotationfiles.ilmn
RESUME WEB SERVICE TRANSACTIONS USE RULE-BASED COORDINATION

T. Tamilselvan1, K. Pugalanthi2, A.Vimalraj3 , D.Vettri4 1Lecturer, 2, 3, 4 Student, Sri Manakula Vinayagar Engineering College, Puducherry.
Abstract
Current approaches to transactional support of distributed processes in service-oriented environments are limited to certain scenarios where participants cant resume an ongoing process. In this paper, we address these limitations by introducing a framework that strengthens the role of the coordinator and allows for largely autonomous coordination of dynamic processes. Failure or exit of participants usually leads to cancellation of the whole process. So, to resume the process we use rule based co-ordination concept. We present our framework TracG, which is based on WS-Business Activity. It contains at its core a set of rules for deciding on the ongoing confirmation or cancelation status of participants.
Keywords Service-oriented architectures, WSCoordination, Web Services Atomic Transaction, Web Services Business Activity. I. INTRODUCTION Service-Oriented Architectures (SOA) is popular paradigm to implement loosely coupled distributed environments. Within these environments, participants can overcome heterogeneity by communicating through standardized, implementation independent interfaces. Application logic is typically not located in monolithic programs, but distributed among the participants of processes. Web services are a typical implementation of the SOA paradigm, and services are more and more used to support long-running business processes. This trend runs along side with a shift from merely considering simple interaction behavior of services, i.e., request-response patterns manifested in standards such as SOAP and WSDL, toward conversational services that engage in long-running conversations with other services. Transactional activity control plays an important role for such processes as it preserves the consistency and provides failure recovery. Among the published specifications for Web service transactions, WS-Coordination, BTP, and WS-CAF are
the most prominent ones. Several comparison and analyses have shown that these specifications provide mostly equal functionality, with the fundamental approaches based on specifications such as X/Open DTP and the well-known two-phase-commit (2PC) protocol. All of the specifications assume that a transaction has an initiator and that this initiator is also the one who is able to decide on the closure of a transaction. This viewpoint is appropriate for service orchestrations where a centralized engine controls the message and data flows. In contrast to centralized service orchestrations, service choreographies describe interactions from a global point of view, i.e., from the perspective of an ideal observer who is able to see all interactions and their flow and data dependencies. We propose a coordination framework that leverages the central position of the coordinator to concentrate a greater share of the overall coordination logic there and in turn reduce the coordination-related complexity on the participants. This way, no individual participant is required to have a complete view of the process. Participants only need to possess a local view of their involvement and the respective conditions, which they share with the coordinator. The coordinator aggregates the local views to a global process state and can decide autonomously when
to trigger the closure. A centralized transaction coordination performed by a dedicated coordinator is independent of the application level interactions between the participants of a process. It can, therefore, be employed in orchestration as well as choreography scenarios. The goals of our framework are twofold: First, our intent is to develop a generic and application independent framework for handling Web services transactions. Second, the framework should especially support advanced workflow scenarios where the initiator is not the participant demarcating a transaction. II. RELATED WORK It was regarding transaction management in the Web services domain, different fundamental approaches can be identified. They are similar to categories discussed in [9], where different strategies for cross organizational service composition are categorized. Web service composition: [Tai et al., 2004] suggests how BPEL, WS-C, WS-AT and WS-BA can be combined to provide coordinated Web processes. [Sauter and Melzer, 2005] compares capabilities of BPEL vs. WS-BA, and advocates extending BPEL to provide distributed coordination features. [Tartanoglu et al., 2003] also addresses service reliability using forward recovery, but their Web service composition actions are statically defined for every participant. [Mikalsen et al, 2002] proposes a framework of action ports (proxy services) that negotiate and enforce transactional behavior in existing services. Workflow systems: [Rusinkiewicz and Sheth, 1995] (and earlier work) discusses transactional workflows, their heterogeneous nature and the need for failure and execution atomicity. [Tang and Veijalainen, 1995] introduces the concept of consistency units (C-unit), transaction-style sections of a process that enforce data integrity constraints across participants. Distributed transaction models and specifications: Sagas [Garcia-Molina and Salem, 1987], A response to the concurrency problems of LRTs, Require
compensation for each business operation. Flexible Transactions [Rusinkiewicz et al., 1990], Added relaxed atomicity. ConTracts [Waechter and Reuter, 1992] focus on reliability: Recovery of scripts as of the time of failure and Allows the Contract to resume execution. Business Transaction Protocol (BTP) defines two types of interactions: atoms (short ACID transactions) cohesions (longer, non-atomic transactions) built from atoms. Web Services Composite Application Framework (WS-CAF) Competing with WS-C/AT/BA in OASIS, Divided into context (WS-CTX), coordination (WS-CF) and transaction management (WS-TXM) Fault tolerance and reliability: Redo Recovery after System Crashes [Lomet and Tuttle, 1995], An installation graph can explain the order of log writes to forward-recover the state of a database. Logical Logging [Lomet et al., 1999], Logical log operations reduce the volume of log records and Allows more generic logging schemas. Durable Scripts Containing Database Transactions [Salzberg et al., 1996] that can be used to record and replay short ACID transactions to restore an applications state. The Generic Log Service [Wirz and Nett, 1993] that can be used to log protocols represented by finite state machines and these include transactional protocols such as 2PC. Transparent Recovery in Distributed Systems [Bacon, 1991], To avoid faulttolerance logic in every application, transparent recovery solutions can transform non-resilient applications into resilient ones. The main architecture of our prototype can be seen in Fig. 1. In order to implement the WSCoordination specification, at least the corresponding Web service port types as described by the OASIS specification are required, i.e., the port types for WSCoordination (RegistrationPort), WS-AtomicTransaction (CoordinatorPort, CompletionPort, ParticipantPort), and for WS-BusinessActivity (CoordinatorPort, ParticipantPort). These port types are the Web service interfaces to communicate with other transaction capable participants
extensions. This concerns, for example, the closure of business transactions and the identity management of participants. 3.1 Existing System In existing for services transaction coordination framework provide limitations of the current standards. But mostly do not consider the limitation of having the initiator of a process demarcating the transaction. Furthermore, there is little support of dynamic aspects: Failure or Exit of participants usually leads to cancellation of the whole process.
3.2 Proposed System We propose a Framework TracG to resume the ongoing process and to monitor the ongoing progress. It contains at its core a set of rules for deciding on the ongoing confirmation or cancellation status of participants work. Completion protocol is used for monitoring the progress of a process. The WS-Business Activity supports this property by two techniques: specification
Fig. 1.Architechture III. PROBLEM DEFINITION WS-Business Activity does not define how to initiate the demarcation for the protocol Business Agreement With Coordinator Completion. Addressing this problem, we have proposed our approach in order to enable the coordinator to decide autonomously about the demarcation. We extended several messages in order to be able to create the invocation tree of a transaction. In order to evaluate the implementation, we mainly tested it for scalability besides normal unit testing during the implementation. Existing approaches are weakened by their limitation to scenarios where the participant initiating the process maintains a controlling position throughout the lifetime of the process. In addition, the WS-Coordination and WS-Business Activity specifications leave several aspects and protocol details undefined, inhibiting interoperability of existing implementations and
1. The WS-BA specification supports two different coordination types: I. The atomic outcome for the known all-ornothing property and the mixed outcome in order to support business scenarios like described. The mixed outcome coordination type indirectly addresses the property of vitality because not all participants are needed in order to successfully finish the transaction. Nevertheless, the vitality
II.
property is neither explicitly annotated nor explicitly modeled for the protocol. 2. The specification supports an (unprotected) withdrawal of participants from a transaction. This might be for example, an airline that will withdraw via an Exit message to the coordinator. Nevertheless, the coordinator has so far no knowledge if the participant is vital or not. Therefore, a coordinator needs more knowledge about the vitality property; otherwise he is not able to decide whether the exit a participant has a severe impact on the whole transaction. IV. COORDINATION TRANSACTION ACTIVITY OF
3. Last but not least, we need predicates that define the membership in complete or cancel sets, which are described by a complete or cancel predicate. By using this small set of predicates, we are able to describe rules that will sort participants into the corresponding sets. Rules 1 and 2 define the membership in the complete; Rules 3 and 4 assign participants to the cancel set. In summary, a participant B is assigned to the complete set of a coordination context 1) if it is invoked by a participant A which is a member of the complete set or 2) if B is invoked by A as a member of a choice set, A is a member of the complete set, and B is a chosen member of the choice set:
We identified a set of fundamental rules which allows assigning participants to the complete or cancel set. For the rules, we will introduce a minimum set of predicates that allows us to model the behavior of the coordinator according to the cancel and complete sets: Accordingly, a participant B is assigned to the cancel set of a coordination context 3) if it is invoked by a participant A which is a member of the cancel set or 4) if B is invoked by A as a member of a choice set and B is not a chosen member of the choice set:
Fig 5 Sample Web service invocation tree 1. We need a predicate for modeling the invocation tree, namely the invocation(A,B) predicate, which depicts that Web service A has invoked participant B. 2. We also need predicates describing the choice sets: First, a partOfChoice predicate depicting that a participant is part of a choice set. Second, a choose predicate defining that Web service A is chosen to be completed.
For regular service invocations, if a service is invoked within a coordination context and the invoker is a member of the complete set, the invoked service is assigned to the complete set as well (Rule 1). This is the most common case: The initiator of a process is usually interested in a consistent outcome and, therefore, invokes services within a newly created coordination context. If these services invoke further services on their own, they usually need to pass on the coordination context to ensure consistency of these subinvocations.
A participant who itself is not a member of the complete set for some reason (and thus is a member of the cancel set) is usually a passivated participant which only remains in the process to ensure an orderly closure. Invoking further services, therefore, makes little sense for such a participant. If it does anyway, the invoked service is assigned to the cancel set as well (Rule 3). In summary, the complete set or cancel set membership of a participant is propagated to any services invoked by this participant. This behavior is particularly important when a participant is moved from the complete set to the cancel set, e.g., because of an application-specific decision: In this case, all participants in the invocation subtree of this participant are moved to the cancel set as well (Rule 3). Corresponding to the partOfChoice() predicate of Rules 2 and 4, choice sets will also be decided. Members of choice sets are at first assigned to the complete set. When a participant comes to a decision on which of the members of a choice set to complete and which to cancel, it reports this decision to the coordinator via a newly introduced Choose message in the WS-BA coordination protocol (corresponding to the choose() predicate of Rules 2 and 4). The coordinator then moves the nonchosen members to the cancel set. The rules so far will consider that participants are vital for the transaction. If we consider the vitality of participants in this first phase, we have to make clear that participants tagged as nonvital will become vital if they are chosen via a Choose message. Therefore, we need to extend the rule base indicating the change of the vitality property by introducing a new predicate named vital:
need the predicate aborted(A) which indicates that a participant A failed to complete his work and exited the transaction via a Fail, Exit, or Cannot- Complete message. The exiting of participants will result in sorting participants into the corresponding close and compensate sets which are represented via predicates close(A) or compensate(A). Close(A) depicts that a participant A should be sent a Close message, compensate(A) means that participant A will receive a Compensate message later. Another important predicate is the vitality of a participant represented as vital(A). The corresponding rules in order to sort the participants into the sets can be specified like the following:
The decision to successfully finish or abort the transaction can also be done via the invocation tree presented in the former section. The coordinator has to decide whether a failed participant will result in a cascading compensation of each step of the transaction. In order to handle this via rules like described in the former section, we need further predicates: First, we
Rule 6 propagates an error within the invocation tree toward the root of the invocation tree by tagging a parent node via propagate2Top. This will only be done if an invoked participant B is already in the complete set from Phase 1, is aborted or tagged by another node and is at least vital for his parent. If the participant itself has already exited the process, we dont need to tag him again in order to send him a Compensate message (which is indicated via aborted(A) inside of Rule 6. Rule 7 propagates an error down to the leaves of the invocation tree if the child node has not exited already. If a participant exists, all his
siblings have to exit the process because their work is not needed anymore. The rules will tag all participants that are in the scope of an error. On the one hand, the error is propagated toward the leaves of the invocation tree, and on the other hand, the error is propagated toward the root of the invocation tree until the error is not vital for a parent. All tagged participants will be either compensated (indicated by Rule 8) or closed if they are not to be compensated or aborted (indicated by Rule 9). V. CONCLUSION Transactional business processes must be composed of reliable participants. Our framework delivers reliability by leveraging existing work on fault tolerance. Overhead is minimal compared to total exec. time. It is effective in all but a limited number of scenarios. It can be easily adopted by existing services. In the future need to address implications of transactions resulting from more complex processes. Investigate recovery based on autonomic management.
2007. [3] P. Furniss, S. Dalal, T. Fletcher, A. Green, B. Haugen, A. Ceponkus, and B. Pope, Business Transaction Protocol (BTP 1.1),http://www.oasisopen.org/committees/download.php/9836/ business_transaction-btp-1.1-spec-wd-04.pdf, 2004. [4] D. Bunting, M. Chapman, and O. Hurley, Web Services Composite Application Framework (WS-CAF) Ver 1.0, http://developers.sun. com/techtopics/webservices/wscaf/primer.pdf, July 2003. [5] Distributed Transaction Processing: Reference Model, Version3,TheOpenGroup,http://www.opengroup.org/boo kstore/catalog/g504.htm, 1993. [6] J. Gray and A. Reuter, Transaction Processing: Concepts and Techniques. Morgan Kaufmann, 1993. [7] C. Peltz, Web Services Orchestration and Choreography, Computer, vol. 36, no. 10, pp. 46-52, Oct. 2003. [8] M. Little and A. Wilkinson, Web Services Atomic Transaction (WS-AtomicTransaction) Version 1.1, http://docs.oasis-open.org/ws-tx/wsat/2006/06, July 2007. [9] T. Freund and M. Little, Web Services Business Activity (WSBusinessActivity) Version 1.1, http://docs.oasis-open.org/wstx/ wsba/2006/06, July 2007. [10] H. Garcia-Molina, D. Gawlick, J. Klein, K. Kleissner, and K.Salem, Coordinating MultiTransaction Activities, Technical Report CS-TR-29790, Dept. of Computer Science, Princeton Univ., 1990.
WS-Coordination provides uniformity for the creation, propagation, and joining of a distributed activity. WS-Atomic Transaction and WS-Business Activity extend this with agreement coordination protocols. Together they provide coordination mechanisms to handle exceptions from a wide variety of sources, ranging from hardware to software to the real world. In conjunction with other Web services standards, we expect them to be useful for applications where the scope spans from an organization within an enterprise, across different organizations in an enterprise, across enterprises, and across different vendor platforms. VI. REFERENCES [1] M.P. Papazoglou and W.-J. Van den Heuvel, Service Oriented Architectures: Approaches, Technologies and Research Issues, Very Large Data Bases J., vol. 16, no. 3, pp. 389-415, July 2007. [2] M. Feingold and R. Jeyaraman,Web Services Coordination (WSCoordination) Version 1.1, http://docs.oasis-open.org/ws-tx/ wscoor/2006/06, July
MoRE-LEARNING
1,2,3
T.Karthick1, A.Mercy Vinoleeya Shefani2, K.Anjuman Sakeena Mubeen3 CSE DEPARTMENT, ANAND INSTITUTE OF HIGHER TECHNOLOGY, Chennai
Abstract
MoRE-Learning mainly emphasizes totransform the complex learning activity into a simple one.Each and every student has his/her own style of learning.MoRE-Learning analyses every student and allots theteaching resource according of their style of learning andalso the number of slides to be issued. Thereby, easing thestudents learning activity.
I. INTRODUCTION
II. RELATED WORK
Our ancient traditional learning system Inof [1], m-Learning is mainly used for the course of B.Sc. comprises of a single teacher with a large set Multimedia. 58 students are gathered together and they students in which the teacher teaches the students. undergo a discussion of three levels, after these It is later transformed into classroom based discussions they tell their ideas for improving mlearning where a teacher teaches a set of students, in Multimedia. All the ideas of these students who are allotted in a classroom in orderlearning to are grouped together and they have formed five themes indulge in learning process. It is the learning (i.e.,) Administration, Presentation, Feedback, Motivation process that we undergo in our day to day life. and Innovation. The students benefit from the final The main drawback of this learning system is that and use these applications in such a way so that the process of learning is made static outcome and they learn numerous different concepts and ideas from individual attention on every student became their lecturers. difficult due to tremendous increase in the number In [4], the paper explores the delivery of students. Then came the second generation of mechanism of Mobile Classroom (MClass) and learning process: Web-based learning. It made the its application in foreign language learning. process of learning little simple by allowing a MClass is the system that overcomes is student to find a large number of teaching incompatibilities of mobile devices and materials and courses on internet. Students gate independent of mobile carriers. The students through the computer network to register, learn find it difficult to learn the subject in second and take up an examination. Therefore, it language; they can study the subject in their paved way by allowing the students to study from own native language. The students can also practice language learning through audio or video their home. The third generation of learning messaging and conferencing and they can process is the emergence of m-learning. It had communicate with their native language people made the process of learning much simpler by with their mobile. The MClass provides JIT help allowing a student to learn in mobility using a for a variety of learning styles. [5], emphasizes on individual based readymade communication device called mobile. Ultimately, the process of learning is no learning. It requires Mobile terminals at both ends. The students learn and ask questions more static. whereas the teachers provide solution to the MoRE-LEARNING willmake it even simpler by analysing the student and allotting the students. All the details are stored in the required type and number of slides depending database and it is virtual class room based upon the students capability. It uses both formal learning. Interaction is carried with the help of and informal form of learning. It is designed SMS, Email, Video phone, Video conference to suite todays young people who are in always and MMS. The sample based evaluation consists of pictures and videos. The design ongeneration. Organized by: Department of Computer Science and Engineering, Anand Institute of Higher Technology, Chennai. www.iconic12.in E-mail: iconic.aiht@gmail.com[Type text] Page 400
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) based evaluation consists of audio and video. communication and the Modern day communication is established via hearing & The test based evaluation consists of normal seeing. Eg:MMS. The phases of education are tests for students. In [6], the soul of m-Learning is class room based, extension from class room application. Learning, TV ads etc. and internet based learning. There is a continuous gap generation between adults and children. Implementation is done with the help of texts and images. The main concept of towards a philosophy is Words Make Division, Pictures Makes Connection. The emergence of m- Learning is situation independent knowledge and for texts, diagrams etc. [7], is exclusively designed for modern day people which helps in updating their carrier life. The main aim of Technology Acceptance Model (TAM) is to carry out interactive learning anytime, anyone, anywhere and any style. The five main elements of TAM are Perceived Usefulness, Perceived Ease of Use, and Attitude towards Using, Behavioral Intention and Actual System Use. It emphasises on Problem based learning, Resource based learning, Informal learning. III. PROPOSED SYSTEM The main idea behind MoRE-Learning is to ease the students learning activity. Teachers construct the teaching resources. Students use these resources as a part of learning activity. Administrator manages the overall application.
Fig. 1 Architecture of the System IV. CONCEPT IN DETAIL In MoRE-LEARNING, students and teachers play a vital role. Administrator manages the overall Organized by: Department of Computer Science and Engineering, Anand Institute of Higher Technology, Chennai. www.iconic12.in E-mail: iconic.aiht@gmail.com[Type text] Page 401
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) withstand few more slides for studying. Pattern match test generally indicates a match the following game consisting of a set of words and a set of pins. The words indicate different colours in different font colours. The pins indicate the different colours to be matched with the words. A word with a particular colour has to be matched with its corresponding pin colour. For example, the word 'Green' is present with font colour yellow has to be matched with the yellow colour pin. The average time taken to complete both the pattern tests is calculated as follows: Average Time Taken= (Time Taken for Pattern Test-1 +Time Taken for Pattern Test2)/2
A. Students Bustle At first the student registers via website to determine his/her characteristic of learning because each and every individual has a different way of learning. Then the student indulges in the learning process via his/her mobile. Registration: The students provide their basic personal information, choice of subject and information regarding the awareness about the subject chosen. Each and every student is provided with a unique ID as described in Fig. 2.
Fig. 2 Students Enrolment The students have to undergo a series of test during the registration process as in Fig. 3.
In case the student is already aware of the subject chosen, then he/she takes up a technical test in order to determine the areas that has to be completely studied by the student and the areas that just need a brush up. Finally after determining the student's characteristics, the process of allocation of teacher takes place.
Fig.4 Allocation of Teachers Learning Process: The students study with the help of mobile phones.
Fig. 3 Registration Process It follows a pattern test, VAK test and pattern test. Basically VAK test consists of a set of thirty non- technical multiple choice questions. These questions are fun to answer for the students as it depends on the day to day activities. The VAK test determines the way the student prefers to learn, i.e., either Audio or Video or Kinaesthetic. Pattern match test is used for dual purpose. One is to refresh the student and the other is to determine whether the student can
Fig. 5 Allocation of Resources As determined during registration process, the way of learning: Audio, video or Kinaesthetic is obtained as the type of learning
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) for the corresponding student. There are threemore clear for the student is identified and levels of teaching resources. They are Low,assignment is allotted on that area. Thereby Medium and Hard. At first, theincreasing the individual learning skills of the student will be allotted a medium level resource.student. The student in turn completes the The withstand capacity of each and every studentassignment allotted to him/her and submits it to varies, so we will allot a particular number oftheir teacher. slides to the student, for example ten slides. Then the student indulges in learning activity. Teachers The student finishes the allotted slides and then B. Activity takes up a pattern match test. The time taken to complete the pattern match test is obtained and The teachers register via website and compared with the average time taken for pattern test that was obtained during the registration upload the teaching resources constructed by process. If the time obtained is below the them. Teachers can even track their student's average pattern test time, then further slides can performance and clear their doubts. be allotted for the student to study. If the time exceeds the average pattern test time, then it clearly indicates that the student needs rest and Fig. 6 Learning and Report Generation hence the learning process is terminated by not Registration: The teachers register via allotting further slides to the student. The student will take up a technical test website by providing their personal details as once after each topic is completed. If the student well as their employment details. They select scores above 80%, then he/she will be allotted their subject and they take up a test on the subject Hard level course material for the next topic. If the selected. student scores in the range of 50% to 80%, then the student is allotted Medium level resource of Based on their test marks, the level of the next topic. If the student scores below 50%, their teaching resource is determined as Low, then he/she will be given an Easy level course Medium or Hard as in Fig. 8. material on the same topic. If a student fails three times continuously in a particular topic, then he/she will be notified to the allotted teacher. 3) Feedback/Doubt/Query: As the students indulge in learning process, it is a natural tendency for them to get doubts on their topic of learning. Hence an open channel is present for the students so that the students can send their feedback/doubts/queries to their respective teachers. Their teachers reply to them as a process of clearing the doubts or queries.
Fig. 8 Teachers Registration Upload Resources: As in Fig.9, the teachers upload the resources that they construct according to their determined level during the registration process.
Fig. 7 Students Query or Doubts 4) Assignment: Based on the scores obtained after each and every topic, the assignments are provided to the student. The area that has to be Organized by: Department of Computer Science and Engineering, Anand Institute of Higher Technology, Chennai. www.iconic12.in E-mail: iconic.aiht@gmail.com[Type text] Page 403
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) teacher so that the teacher can construct a new resource for the notified student and allot it. Level gradation: In this process, the teacher is constantly monitored. If their set of student does provide a good result, then they are automatically upgraded to the next level.
Fig. 9 Types of Resources Teachers construct their resources in all three types: Audio, Video and Kinesthetic. These resources are later on retrieved by the student for learning. Probes: In case the teachers are not compatible with the system, then they can send their problems or queries to the administrator. Administrator in turn responds to them. Tracking the Student: Teacher can track the performance of their students individually by tracking their performance in each and every topic. Teacher obtains the assignments submitted by their students and assesses them. They clear the doubts or queries of their students. They can even suggest few improvement techniques to the student, thereby, providing individual concentration on every student.
Fig. 11 Monitoring teacher via students performance For example, if a teacher has been allotted a low level during the registration process and their set of students show a good performance, then the teacher is upgraded from low to medium level. Fig.12 Upgrading Teachers Level In the same way, if the performance level of their students gradually decreases, then the teacher is asked to take up the test on their subject immediately and based on the test results, a level is allotted to them.
Fig. 10 Teacher Tracking Students Performance. In case the student continuously fails for three times, then the student is notified to the Organized by: Department of Computer Science and Engineering, Anand Institute of Higher Technology, Chennai. www.iconic12.in E-mail: iconic.aiht@gmail.com[Type text] Page 404
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) Fig.13 Degrading Teachers Level C. Administrators Management manages the overal l process of MoRE-Learning by managing both the students and the teachers. Administrator manages the details of students and the teachers. The probes of the teachers are clarified. Administrator ensures that the teachers provide their resources on regular basis and clarify their students queries/doubts. Administrator determines the performance level of the set of students allotted to every teacher. D. Algorithms Used Breadth First Search: Used to find the resources from the database and provide it to the student. At first, the topic node is explored. Followed by the type node; i.e. Audio, Video or Kinesthetic; and level node; i.e. Easy, Medium or High; are explored. Finally the optimized resource is obtained and provided to the student. The administrator
Fig. 15 Planning Graph
CONCLUSIONS
It promotes self-learning skills and provides effective learning. It is personalized according to a students requirements. It enhances Anytime, anyplace learning and exhibits more granular design. It consumes battery power of mobile and the security issues during the examination time could not overcome. In future, MoRE-Learning can be enhanced by using cloud computing. As a result, MoRE- Learning transforms the process of learning from complex to simple one by changing the push type learning to the pull type. Thereby, not forcing the student to learn, Fig. 14 Breadth First Search Algorithm in MoRE instead making the process of learning Planning graph: Used to determine the level of interesting and compatible to the students. slide to be allotted in each phase. The level of the slides can be either Easy or Medium or Hard. It is purely based on the marks scored by the student after every topic.
REFERENCE
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) [1] Vanja Garaj ,m-Learning in the Education of Multimedia Technologist and Designers at the University Level:A User RequirementStudy,2010 [2] Nevena Mileva, Silviya Stoyanova-Petrova, Slavka Tzanova, Mobile Technology Enhanced Learning,2011 [3] Jason W.P. Ng, Mohamed Jamal Zemerly, Omran Ahmed Al Hammadi, Context-aware collaborative mlearning in an intelligent campus environment, 2011.
Cross Layer Optimization for Multiuser Video Streaming using Distributed Cross Layer Algorithm
Mr. R. Raju 1, J. Vidhya@Shankardevi2, G. Sumathi3 Associate professor, Sri Manakula Vinayagar Engineering College 2, 3 M.Tech (pursuing), Department Of CSE, Sri Manakula Vinayagar Engineering College
1
ABSTRACT Multimedia and networking are inseparable. The integration of multimedia services into wireless communication networks is a major source of future technological advances. Due to the integration of multimedia services, the increasing energy consumption of a mobile unit is also becoming a dominant factor in the design of communication systems. Video streaming over wireless networks is must for many applications, ranging from home entertainment to surveillance to search-and-rescue operations. Due to the increased energy requirement of signal processing and wireless transmission, the limited battery capacity of mobile devices has become a major drawback. This paper proposes a cross layer optimization algorithm, which includes routing based on neighbor discovery and dual congestion control for improving QoS. It helps in minimizing the energy required in transmission of video packets. This in turn leads to green computing. Keywords ; cross layer; congestion control; Green computing; HCCA 1. INTRODUCTION This section provides a brief introduction about Wireless networks and multimedia 1.1 WIRELESS NETWORKS Wireless network is a network set up by using radio signal frequency to communicate among computers and other network devices. It referred to as WiFi. They are of three types Wide area networks (WAN)that the cellular carriers create, Wireless local area networks(WLAN), that you create, and Personal area networks(PAN), that create themselvesThe components of wireless network are Antennas, Transceivers, Integrated Circuits, Analogue-Digital Converters, LCD Screens and Batteries. Infrastructure of wireless network is Cell Towers, Base Stations (access points), Filters, Routers & Switches, Power Amplifiers and Edge Packets. Working of wireless networks is two computers each equipped with wireless adapter and wireless router. When the computer sends out the data, the binary data will be encoded to radio frequency and transmitted via wireless router. The receiving computer will then decode the signal back to binary data. IEEE 802.11 standards specify two operating modes: infrastructure mode and ad hoc mode. Infrastructure mode is used to connect computers with wireless network adapters, also known as wireless clients, to an existing wired network with the help from wireless router or access point. Ad hoc mode is used to connect wireless clients directly together, without the need for a wireless router or access point. An ad hoc network consists of up to 9 wireless clients, which send their data directly to each other. 1.2 MULTIMEDIA Multimedia means that computer information can be represented through audio, video, and animation in addition to traditional media (i.e., text, graphics drawings and images). Multimedia is the field concerned with the computercontrolled integration of text, graphics, drawings, still and moving images (Video), animation, audio, and any
other media where every type of information can be represented, stored, transmitted and processed digitally. A Multimedia Application is an Application which uses a collection of multiple media sources e.g. text, graphics, images, sound/audio, animation and/or video. Hypermedia can be considered as one of the multimedia applications 1.2.1 GENERAL CHARACTERISTICS MULTIMEDIA SYSTEM OF
It is desirable to minimize the energy consumption of all users, including both video coding and wireless transmission, while satisfying the video quality requirement imposed by the end-user. Streaming can be more complex in a packet based network because they have strong and specific requirements. The QOS of the video streaming can specify some requirement which is the video data flows must be formatted, denoting that the latency between consecutive packets must be the same, the data bit rate has to be high and constant, and the video packet loss rate must be close to zero. Constant bit rate is needed to feed the decoder application in a proper way, and to see the video without interrupt. 3. RELATED WORKS The main areas, the previous work concentrated are Low power design and changing the configuration parameters to reduce the energy during transmission of video packets. 3.1 LOW POWER DESIGN ISSUES In the wireless network low power design issues have been addressed in the following four areas. 3.1.1 DEVICE LEVEL OPTIMIZATION Low power VLSI design [2] and low power RF circuitry [3] optimization are the main technologies for energy saving approaches. Dramatic reduction in power dissipation requires architectural, algorithmic, and circuit design optimization, which are limited by semiconductor and device technologies. 3.1.2 MEDIUM ACCESS PROTOCOL DESIGN CONTROL (MAC)
The four basic characteristics of multimedia systems are Multimedia systems must be computer controlled, Multimedia systems are integrated, the information they handle must be represented digitally, and the interface to the final presentation of media is usually interactive. 1.2.2 VIDEO STREAMING Streaming video is content sent in compressed form over the Internet and displayed by the viewer in real time. With streaming video or streaming media, a Web user does not have to wait to download a file to play it. Instead, the media is sent in a continuous stream of data and is played as it arrives. The user needs a player, which is a special program that uncompresses and sends video data to the display and audio data to speakers. A player can be either an integral part of a browser or downloaded from the software maker's Web site. Major streaming video and streaming media technologies include Real System G2 from Real Network, Microsoft Windows Media Technologies 2. OBJECTIVE OF THE PAPER This paper proposes an algorithm called distributed cross layer optimization and cross layer optimization in order to prevent more amount of energy used in forwarding the video packets across the network. When streaming live video across wireless links, two main sources of energy consumption are video coding and wireless transmission. In most of the state-of-the-art video encoders and wireless transmitters, there are configuration parameters which can be tuned based on varying channel conditions and/or video characteristics.
Energy-efficient MAC protocol design principle has three constraints [4] Packet structure. It partitions a packet into two parts: the low-bit-rate part for control information and the high-bitrate part for data; due to the different error tolerant requirements of each part, one can achieve total energy saving. Awake/Doze mode. It puts the system into the sleep mode while not receiving or sending data. Error Control design. The description of this design can be found in the next item. 3.1.3 COMMUNICATION OPTIMIZATION SYSTEM LEVEL
3.1.4 APPLICATION LEVEL DESIGN Engineers are developing low complexity software or hardware for multimedia processing algorithms [10], [11]. Usually low complexity algorithms yield low energy consumption, provided that less iterations or looping are involved in the computation. One of the reasons for not considering the power consumption issue in the resource management strategy has been the failure to acknowledge the importance of the interaction between the processing power and the associated transmission power. 3.2 HCF CONTROLLED CHANNEL ACCESS (HCCA) The HCF (hybrid coordination function) controlled channel access (HCCA) works a lot like PCF(Point Coordination Function) it contrast to PCF, in which the interval between two beacon frames is divided into two periods of CFP (Contention Free Period) and CP(Contention Period), the HCCA allows for CFPs being initiated at almost any time during a CP. This kind of CFP is called a Controlled Access Phase (CAP) in 802.11e. A CAP is initiated by the AP whenever it wants to send a frame to a station or receive a frame from a station in a contention-free manner. The other difference with the PCF is that Traffic Class (TC) and Traffic Streams (TS) are defined. This means that the HC (Hybrid Coordinator) is not limited to per-station queuing and can provide a kind of per-session service. Also, the HC can coordinate these streams or sessions in any fashion it chooses (not just round-robin). Moreover, the stations give info about the lengths of their queues for each Traffic Class (TC). The HC can use this info to give priority to one station over another, or better adjust its scheduling mechanism. 3.3 MULTIMODE ADAPTIVE POWER SAVING (MAPS) PROTOCOL In protocol assumptions to be made are, i) The cellular network has a feedback power control
A communication system-level optimization approach is devised called, global interference minimization [5]. Global interference minimization refers to the transmitter power control problem in cellular radio systems. This has provided an optimal solution in the sense that it minimizes interference (or outage) probability. The optimal solution for the power control problem involves solving eigen values of path gain matrices. This solution is computationally expensive and impractical in the real world. There comes [6], [7] a simplified distributed power control algorithm to tackle this problem. The distributed power control algorithm differs from the centralized power control problem in which each mobile adaptively adjusts its transmitter power according to the received interference. The distributed method releases the computational task performed by each base station. The CDMA power control strategy also provides a simple solution to the interference minimization problem. There are two types of power control algorithms, close-loop, and open-loop control [8], [9]. Close-loop control refers to the feedback made from the base station to mobile stations for adjusting mobiles transmitted power. Open-loop control refers to the self transmission power adjustment of mobile stations by comparing the received signal strength from the base station with a reference signal level.
mechanism, i.e., the base station commands mobile users to increase or decrease transmitted power through the control channel at a period of time. ii) The system is set for multimode transmission where radio frequency (RF) signal-to-noise (SN) ratios are different from mode to mode.
3.4 MULTIRATE TRANSMISSION SCHEME: MULTISTAGE CODED MODULATION (MCM) To integrate a multimode coder into the power-saving system, an efficient transmission scheme has to be built. The beneficial transmission scheme is used here, therefore source/channel (S/C) rate optimized coding and multirate modulation (MM) are left as potential candidates.
Figure.1 Base station controlled adaptive method MAPS protocol can be summarized as, Mobile ready to send, exchanging information between the base station and the mobile for the initial transmission power level and modes (e.g., in H.263 video transmission, one can set initially mode0 with intra coding frame). After one frame is coded, the corresponding processing power is measured and sent to the base station, and it is stored in the processing power table. The remaining processing power levels of other modes are estimated by multiplying the pre-estimated inter mode factors. The base station estimates the required transmitted power level of the mobile through the power control mechanism and stores the result in the transmission power table. Power levels of other modes can be easily found by multiplying different RF SN ratios. Add the processing power table and the transmission power table to form a total power table. Find the minimum in the total power table and the corresponding mode. The base station sends power and mode updates to the mobile; the mobile uses the new mode for next frame coding. Go back to step 2. Figure.2 Multistage coded modulation It has the simple reconfigurable QAM scheme and FEC coding that guarantee a given quality while minimizing transmission power consumption [12]. To make the switch between higher level and lower level modulation settings simpler, we introduce so-called inserted MQAM. 3.5 ISSUES IN THE RELATED WORK Energy or power saving criteria is approached either from an information-theoretic perspective or from an implementation-specific viewpoint. Modulation strategies are derived for delay-bounded traffic. It is shown that when the transmit power and circuitry power are comparable, the transmission energy decreases with the product of bandwidth and transmit duration. They however only consider an idealized network restricted to a single flow with no medium access controller (MAC) or link layer retransmissions, and with ideal constellation sizes. Non-multimedia applications will experience degraded performance, cannot be universally applied to all network configuration.
4. PROBLEM DEFINITION In the proposed scheduling algorithm, we include routing based on neighbor discovery and dual congestion control for improving QoS. After receiving the request the resource server can schedule the request .that is the form a table for which node needs which data. After scheduling the resource server can perform three steps which are Analyze the nearest requested node by use of shortest path, check the requested nodes are neighbors of each other, Form the index for avoiding neighboring collision If the requested nodes are neighbors of each other Resource server sent the RTS (ready to sent) and ID of the neighbor requested node and then Requested node can check the ID is a neighbors ID.If the ID is a neighbors ID the process move to congestion control otherwise the process move to routing component. If the requested nodes are not a neighbors of each other the process move to congestion control First Congestion Control Form the queue and sent RTS to first requested node and then Node can send the CTS (clear to sent) to the resource server Second Congestion Control Form the queue and sent requested data to that first sent CTS by the node and then Requested node can receive the data 4.1 DISTRIBUTED CROSS-LAYER ALGORITHM In the Distributed Cross-Layer Algorithms for the Optimal Control scheduler implements joint application and fabric layer optimization scheduling algorithm. Cross-layer design introduces interlayer coupling across the application layer and the fabric layer and allows the exchange of necessary information between the application layer and the fabric layer. Figure.3 Distributed cross layer algorithm
Figure 3 Distributed cross layer algorithm
4.2 ADVANTAGES OF PROPOSED WORK This algorithm also helps in avoiding congestion control with the help of dual congestion control To reduce the run-time computation load, the fast greedy algorithm will be employed. Transfer the data packets. Each terminal finds the maximum quality factor for all its possible complexities, and the base station searches in the space of compression complexity. The
parameters to be adjusted are source coding bit rates, compression complexity, and transmitter power for all users. 5. CONCLUSION This proposed a cross-layer optimization scheme for multi-user video streaming using distributed cross layer optimization algorithm. The goal is to minimize total energy consumption of all users, including both video coding and wireless transmission energy, while satisfying video quality target. Source coding bit rates, and transmission power corresponding to the best compression complexity are taken together as the operating parameters. Since the energy consumption is reduced it achieves green computing. 6.Refernce [1] M. Dertouzos, V. Zue, J. Guttag, and A. Agarwal, Special Report: The Oxygen Project, Sci. Amer., Aug. 1999. [2] A. P. Chandrakasan and R.W. Brodersen, Minimizing power consumption in digital CMOS circuits, Proc. IEEE, vol. 83, p. 498, Apr. 1995. [3] A. A. Abidi, Low-power radio-frequency ICs for portable communications, Proc. IEEE, vol. 83, pp. 544569, Apr. 1995. [4] H.Woesner, J. Ebert, M. Schlager, and A.Wolisz, Power-saving mechanisms in emerging standards for wireless LANs: The MAC level perspective, IEEE Pers. Commun., pp. 4059, June 1998. [5] J. Zander, Performance of optimum transmitter power control in cellular radio systems, IEEE Trans. Veh. Technol., vol. 41, pp. 5762, Feb. 1992. [6] S. A. Grandhi, R. Vijayan, D. J. Goodman, and J. Zander, Centralized power control in cellular radio systems, IEEE Trans. Veh. Technol., vol. 42, pp. 466 468, Nov. 1993. [7] G. J. Foschini and Z. Miljanic, A simple distributed autonomous power control algorithm and its convergence, IEEE Trans. Veh. Technol., vol. 42, pp. 641646, Nov. 1993. [8] R. Kohno, R. Meidan, and L. B. Milstein, Spread spectrum access methods for wireless communications, IEEE Commun. Mag., Jan. 1995.
[9] R. Padovani, Reverse link performance of IS-95 based cellular systems, IEEE Pers. Commun., Third Qtr. 1994. [10] A. Said and W. Pearlman, A new fast and efficient image codec based on set partitioning in hierarchical trees, IEEE Trans. Circuits Syst. Video Technol., vol. 6, pp. 243250, June 1996. [11] T. Lan and A. Tewfik, MultiGrid Embedding (MGE) image coding, in Proc. 1999 IEEE Int. Conf. Image Processing (ICIP99), Kobe, Japan, 1999. [12A unified framework of source/channel/modulation coding and low power multimedia wireless communications, in Proc. e IEEE Second Workshop on Multimedia Signal Processing, Redondo Beach, CA, Dec. 1998, pp. 597602. [13] ITU-T Recommendation H.263: Video Coding for Low Bit Rate Communication,ITU, 1998. [14] Lan and A. Tewfik, A resource management strategy in wireless multimedia communicationstotal power saving in mobile terminals with a guaranteed QoS, IEEE Trans. Multimedia, vol. 5, pp. 267-281, Jun. 2003. [15] X. Lu, E. Erkip, Y. Wang, and D. Goodman, Power efficient multimedia communication over wireless channels, IEEE Journal Selected areas in Comminications, vol. 21, pp. 1738-1751, Dec. 2003. [16]X. Lu, Y.Wang, E. Erkip, and D. Goodman, Minimize the total power consumption for multiuser video transmission over CDMA wireless network: a twostep approach, in Proc. ICASSP05, Philadelphia, PA, vol. 3, pp. 941-944, Mar 2005. [17] IEEE Std. 802.11e-2005, Part 11: Wireless LAN medium access control (MAC) and physical layer (PHY) specifications Amendment 8: medium access control (MAC) quality of service enhancements, Nov. 2005. [18] S. Pollin, R. Mangharam, B. Bougard, L. Van der Perre, R. Rajkumar, I. Moerman, and F. Catthoor, MEERA: Cross-layer methodology for energy efficient resource allocation for wireless networks, IEEE Trans. Wireless Commun., vol. 6, pp. 617-628, Feb. 2007. [19] A Cross-Layer Algorithm of Packet Scheduling and ResourceAllocation for Multi-User Wireless Video TransmissionPeng Li, Yilin Chang, Nina Feng,
Throughput Enhancement and Delay Reduction in P2P Network

1,2
Abstract
Deepthi.K.Oommen1,R.Siva2 KCG College of Technology, Chennai
Peer to peer network is a distributed network architecture.The participants in the network can be both resource providers and consumers. In peer-to-peer (P2P) live streaming, peers collaboratively organize themselves into an overlay and share their upload capacities to serve others. The scope of the work is to propose the sub-stream video packet scheduling for peer to peer live streaming.Packet scheduling is an important factor on overall playback delay. Hybrid pull push approach are used, video are divided into sub-streams and packets are pushed with low delay.This paper is focus on the transmission of packets from source to destination with low delay and high bandwidth utilization. Keywords--Hybrid pull push, sub-stream, packet delay, p2p network and bandwidth. I. INTRODUCTION Peer-to-peer (P2P) is an alternative network model of traditional client-server architecture. A peer plays the role of a client and a server at the same time. That is, the peer can initiate requests to other peers, and at the same time respond to incoming requests from other peers on the network. In P2P networks, clients provide resources, which may include bandwidth, storage space, and computing power. This property is one of the major advantages of using P2P networks because it makes the setup and running costs very small for the original content distributor. The decentralized nature of P2P networks increases robustness because it removes the single point of failure that can be inherent in a client-server based system [9]. Another important property of peer-to-peer systems is the lack of a system administrator. This leads to a network that is easier and faster to setup and keep running because full staffs are not required to ensure efficiency and stability. Another characteristic of a P2P network is its fault tolerance, when a peer goes down or disconnected from the network, the p2p application will continue by using other peers. Bit-torrent is a peer to peer file sharing protocol used for distributing large amounts of data over the Internet. Bit-Torrent bases its operation around the concept of a torrent file, a centralized tracker and an associated swarm of peers. The centralized tracker provides the different entities with an address list over available peers [6]. Later improvements tries to leave the weakness of a single point of failure (the tracker), by implementing a new distributed tracker solution. Bit Torrent is a platform on which p2p network is created. The p2p network with bit torrent protocol can be used in live streaming. Live streaming is nothing but sending video and audio signals real time over the internet. More specifically, it means taking the media and broadcasting it live over the Internet. For the transmission of packets:(a) mesh pull is often used, in mesh overlay network due to its simplicity and robustness and bandwidth utilization, but these benefits comes with cost of long delay because it uses buffer map method and pulling method for packet exchange between child and parents[1].(b) To provide robustness against peer churns and to meet the streaming bandwidth requirements, a mesh should be in an overlay manner, where each peer connects to some other peers are called parents. The child needs to retrieve packets from the parents and assemble a full stream to achieve stream continuity, but a child should determine which parents should deliver which packet. This is called scheduling. If scheduling is not designed properly it will cause some scheduling delay and reducing such delay is very critical [1]. In order to provide the efficient bandwidth utilization, each peer is connected to the parent and it shall be aggregated as child. It will stream the chunks
of packets with different content with heterogeneous bandwidth. This causes some delays in the control packet exchanges and also to the data. This delay makes an impact, which is very critical to the packet delivery ratio of packet transmission. In this paper the scope of work is to propose the sub-stream video packet scheduling for the peer to peer live streaming. In the peer to peer live streaming using unstructured mesh, packet scheduling is an important factor on the overall playback delay. This paper uses hybrid pull-push approach that has been recently proposed to reduce delay compared to classical pulling method 1]. In this approach packets are divided into sub-streams and are pushed with low delay.To avoid the delay in real time scenario, the proposed solution is to provide delay aware packet scheduling and hybrid pull-push approach using substream allocation on the peer to peer live streaming.
II. SYSTEM MODEL
Application Layer
Schedule Substream Process
Measure and update Delay chunk
allocation
Process and identify
sub-stream
Allocate peer list Network Layer Parent
transferring variable length data sequences from a source host on one network to a destination host on a different network. The Network Layer performs network routing functions, and might also perform fragmentation and reassembly, and report delivery errors. Routers operate at this layer sending data throughout the extended network and making the Internet possible. The Application Layer is the OSI layer closest to the end user, which means that both the OSI application layer and the user interact directly with the software application. This layer interacts with software applications that implement a communicating component. Application layer functions include identifying communication partners, determining resource availability, and synchronizing communication [10]. It uses the IEE 802.11, which is used for wireless channel transmission system. This paper does a simulation analysis to show how the substream allocation method reduces the delay. For simulation analysis the work considers, 15 nodes and showing the packet transmission between nodes. Each peer has a unique identifier called peer-id. The id is randomly generated so the chance of generating two identical ids is extremely low. The parent peers hold the sub-streams and is send to the child peer when it request for, like source to destination transmission. The size of the packets is based on Ethernet packet size 1500 bytes. Packets are transmitted from source to destination through Ad hoc On Demand Routing Protocol where the bandwidth is used effectively. The simulation does an analysis of the parameters such as residual loss rate, packet delay, bandwidth dilation and graphs are plotted after the evaluation of parameters. A. Sub-Stream Allocation Transmission of packets through peer to peer network is by sub-stream allocation, i.e. dividing the packets in sub-stream for transmission. Sub-stream allocation process maintains the load balance and help in the successful transmission of packets. Each substream is of size 1500 bytes. It is same as Ethernet packet size. The transmission process is packet divided into sub-streams and each one is delivered to the child peer or the receiver through different peers also called as parent peers with low delay. Peers that have substreams are called parent peers. The parent peers send sub-streams to the child peer. It acts as a source to
The above figure explains the packet transmission.P2P network is a peer to peer network, which works with a bit-torrent platform where different peers are sending packets by a method called sub-stream allocation. It concentrates in decreasing the delay in live streaming and to increase throughput. It focuses on reducing packet loss rate. The work shows packet transmission between network layer and application layer with constant bit rate. The Network Layer provides the functional and procedural means of
destination or sender to receiver transmission. For providing robustness against peer churns and to meet streaming bandwidth requirement, a mesh overlay is used, where each peers are connected to some others (called parent peers), child peer can retrieve packets from these peers. B. Push Stream Push Stream means pushing of sub-streams to the client or receiver. It uses hybrid pull push approach. It will give high bandwidth utilization and low delay. The hybrid pull-push approaches forms tree structures but in a simpler way. Instead of pulling one block, a peer pulls a number of blocks within one request. Once the request accepted, all the requested blocks are pushed automatically. This scheme avoids the problem of pull-based protocols which incur one round trip delay in each forwarding hop. A newly joined peer first selects sub-stream parents among its neighbours for all sub-streams and sends pull requests to every selected parent. When the requests received, the parents will start a new pushing service for the peer, i.e. sequentially push the interested blocks within the requested sub-streams. c. Performance Measurement It means measuring the performance statistics in the peer to peer data structures. Statistical value is to compare the performance variations of each transmission by the peer. Following are the performance metrics for the simulation analysis: Residual Loss Rate: It is the percentage of packets that cannot be successfully received at the peer. Packet Delay: It refers to the source peer delay of each packet for a peer. This shows the general delay performance of the packets Bandwidth Utilization: It is the wise use of the available bandwidth. It is also called as channel utilization
SIMULATION ANALYSIS
Simulations experiments are carried out in Network Simulator-2 with Cygwin Unix simulator. The field configuration consists of 15 nodes in different position with flat grid topology. Data Packet size is 1500 byte. It is showing packet transmission between nodes. The channel frequency is2.472 and bandwidth rate is 11mb. The performance metrics described below are used in the performance analysis. Delay Analysis: It is the delay caused by the data rate of the link. It can be given by the formula DT = N/R where DT is the transmission delay, N is the no of bits and R is the rate of transmission. Packet Loss Rate: It is the failure of one or more transmitted packets to arrive at the destination. Packet Delivery Ratio: The ratio of total packets successfully received to total ones sent from the source. Bandwidth Usage Analysis: The amount of data transmitted and received by a particular by a computer or user.
TABLE 1 Parameters for simulation
NUMBER
PARAMETERS
VALUES
1 2
Nodes
15
Fig 3 Packet loss rate vs. number of nodes The fig 2 gives results of delay vs. no of nodes. In networks transmission delay is the amount of time taken to push all of the packets. In this result, it shows the delay occurred for each node during transmission. The fig 3 gives results of packet loss rate vs. no of nodes. . Packet loss is the ratio of number of lost packets and the summation of lost packets and number of packets received successfully. Packet loss occurs because of the channel congestion, corruption in the packets etc. So it is necessary to reduce the packet loss rate in order to increase the overall throughput. Here the result shows that packet loss rate of the nodes 2, 3 and 7 are high. But on an average the loss ratio is less, which helps in increasing through put.
Data and control 11mbps packet rate Topology Packet size Transmit Power Flat Grid 1500bytes 0.0316227
3 4 5
Fig 2 delay vs. number of nodes
Fig 4 PDR vs. number of nodes
Fig 5 Bandwidth vs. number of nodes The fig 4 gives the results of PDR vs. No of nodes.PDR stands for packet delivery ratio. It is defined as the ratio of total no of packets that have reached the destination node to the total number of packets originated at the source node. The maximum packet delivered per node is N, where is the channel capacity and N is the no of nodes. The fig 5 gives the result of bandwidth analysis with respect to number of nodes. The bandwidth analysis shows the utilization of bandwidth. For example, if the throughput is 70 Mbit/s in a 100 Mbit/s Ethernet connection, the channel efficiency is 70%. In this example, effective 70Mbits of data are transmitted every second. Utilization is the percentage of a network's bandwidth that is currently being consumed by network traffic. The purpose of knowing utilization is to know whether a link in the network is overloaded. Utilization is calculated by the formula: Utilization % = (data bits x 100) / (bandwidth x interval) CONCLUSION AND FUTURE WORK This paper presents a brief description about substream allocation in a p2p network using bit torrent protocol. It acts as a platform for p2p in which the packets are converted into sub-stream and pushed to the receiver by hybrid pull-push approach. This helps in reducing packet loss and maintains load balance in the network.ad hoc routing protocol is used for routing process. This work emphasis on reducing delay and increasing throughput of the system. A simulation analysis has been done where the graph is plotted for the delay, bandwidth, packet loss ratio and packet delivery rate. This system can be extended by improving the performance of the simulation parameters. REFERENCE
[1] K.-H. Kelvin Chan S.-H. Gary Chan, Optimizing Sub-stream Scheduling for Peer-to-Peer Live Streaming, in Proc. IEEE CCNC, 2010. [2] Hyunseok Chang , Sugih Jamin , Wenjie Wang, Live Streaming with Receiver-based Peer-division Multiplexing ,in Proc. IEEE CCNC, 2011. [3] Xinyan Zhang, Jiangchuan Liu, Bo Li, and Tak-Shing Peter Yum, Coolstreaming/DONet: a data-driven overlay network for peer-to-peer live media streaming, in Proc. IEEE INFOCOM, 2005, pp. 21022111. [4] Nazanin Magharei and Reza Rejaie, PRIME: Peer-to-peer receiverdriven MEsh-based streaming, in Proc. IEEE INFOCOM, 2007, pp. 14151423. [5] Bo Li, Susu Xie, Gabriel Y. Keung, Jiangchuan Liu, Ion Stoica, Hui Zhang, and Xinyan Zhang, An empirical study of the coolstreaming+ system, IEEE Journal on Selected Areas in Communications, vol. 25,no. 9, pp. 16271639, 2007. [6] Jahn Arne Johnsen , ,Lars Erik Karlsen ,Sebjrn Sther Birkeland, Peer-to-peer networking with BitTorrent, Department of Telematics, NTNU - December 2005. [7] http://www.hp.com/rnd/device_help/help/hpw nd/webhelp/ HPJ3298A/utilization.htm [8] YAGO,A hybrid pull-push peer-to-peer live streaming system, Thesis in masters project, Royal Institute of Technology (KTH)Stockholm, Sweden, 2010 [9] en.wikipedia.org/wiki/Peer-to-peer [10] en.wikipedia.org/wiki/OSI_model [11] Jiahong Wang, , Eiichiro Kodama, and Toyoo Takada,An Effective Approach to Improving Packet Delivery Fraction of Ad Hoc Network, in Proc of IMECS 2011 vol 1. Nazanin Magharei, Reza Rejaie, and Yang Guo, Mesh or multiple-tree
Cloud Computing Approach for Parallel Data Processing and Dynamic Resource Allocation
Lecturer,Department of Computer Science and Engineering Anand Institute of Higher Technology Abstract In recent years ad-hoc parallel data processing has emerged to be one of the killer applications for Infrastructure-as-a-Service (IAAS) clouds. Major Cloud computing companies have started to integrate frameworks for parallel data processing in their product portfolio, making it easy for customers to access these services and to deploy their programs. However, the processing frameworks which are currently used have been designed for static, homogeneous cluster setups and disregard the particular nature of a cloud. Consequently, the allocated compute resources may be inadequate for big parts of the submitted job and unnecessarily increase processing time and cost. In this project the data processing framework to explicitly exploit the dynamic resource allocation offered by todays IAAS clouds for both, task scheduling and execution is constructed. Particular tasks of a processing job can be assigned to different types of virtual machines which are automatically instantiated and terminated during the job execution. As an application of this frame work a file is stored in cloud with dynamic environment. Due to dynamic behavior of cloud more than one client can access the resource, which other user can access the stored file hence the security problem. To overcome this password system is constructed by integrating sound signature in graphical system. A click-based graphical password scheme called Cued Click Points (CCP) is introduced. In this system a password consists of sequence of some images in which user can select one click-point per image with sound signature recalling. 1.Introduction For companies that only have to process large amounts of data occasionally running their own data center is obviously not an option. Instead, Cloud computing has emerged as a promising approach to rent a large IT infrastructure on a short-term pay-per-usage basis. Operators of so-called IaaS clouds, like Amazon EC2,[1] let their customers allocate, access, and control a set of virtual machines (VMs) which run inside their data centers and only charge them for the period of time the machines are allocated. The VMs are typically offered in different types, each type with its own characteristics (number of CPU cores, amount of main memory, etc.) and cost. Present system of computing is perfectly proprietary in terms of hardware. You have to pay for whatever resources you want to run your program. Nephele opens a new door towards the concepts of clouding hardware resources. The proposed model suggests the possibility of using a networked cluster of computers as a substitute for a supercomputer in terms of processing and memory power. Depending on the availability of the system resources, the server will distribute the work load among the networked computers to meet the required throughput. The individual systems will then process the fragmented work using its own free resources and in the end the server will club together all the processed fragmented outputs and will provide the user resultant output.Todays processing frameworks typically assume the resources they manage consist of a static set of homogeneous compute nodes. Although designed to deal with individual nodes failures, they consider the number of available machines to be
P.SuthanthiraDevi1 , K.AmsaValli2
constant, especially when scheduling the processing jobs execution. While IaaS clouds can certainly be used to create such cluster-like setups, much of their flexibility remains unused.One of an IaaS clouds key features is the provisioning of compute resources on demand. New VMs can be allocated at any time through a well-defined interface and become available in a matter of seconds. Machines which are no longer used can be terminated instantly and the cloud customer will be charged for them no more. Moreover, cloud operators like Amazon let their customers rent VMs of different types, i.e., with different computational power, different sizes of main memory, and storage. Hence, the compute resources available in a cloud are highly dynamic and possibly heterogeneous. This model deals with dynamic environment hence the user data is insecure. In order to provide security an integration of sound signature with graphical password authentication system is designed.This paper proposed to discuss the particular challenges and opportunities for efcient parallel data processing in clouds and present Nephele, a new processing framework explicitly designed for cloud environments.[2] Most notably, Nephele is the rst data processing frame-work to include the possibility of dynamically allocating deallocating different compute resources from a cloud in its scheduling and during job execution. It includes further details on scheduling strategies and extended experimental results. Module I -starts with analyzing the opportunities and challenges and derives some important design principles for our new framework. In Module-2 present Nepheles Characterstics and System Module.Module-3 present basic architecture and outline how jobs can be described and executed in the cloud. Module-4 Architecture design Module-5 describes Job Execution performance and security . 2. Challenges and Opportunites 2.1 Opportunites: With respect to parallel data processing, this exibility leads to a variety of new possibilities, particularly for
scheduling data processing jobs. The question a scheduler has to answer is no longer Given a set of compute resources, how to distribute the particular tasks of a job among them?, but rather Given a job, what compute resources match the tasks the job consists of best?.This new paradigm allows allocating compute resources dynamically and just for the time they are required in the processing workow. E.g., a framework exploiting the possibilities of a cloud could start with a single VM which analyzes an incoming job and then advises the cloud to directly start the required VMs according to the jobs processing phases. After each phase, the machines could be released and no longer contribute to the overall cost for the processing job. Facilitating such use cases imposes some requirements on the design of a processing framework and the way its jobs are described. First, the scheduler of such a frame-work must become aware of the cloud environment a job should be executed in.[3] It must know about the different types of available VMs as well as their cost and be able to allocate or destroy them on behalf of the cloud customer. Second, the paradigm used to describe jobs must be powerful enough to express dependencies between the different tasks the jobs consists of. The system must be aware of which tasks output is required as another tasks input. Otherwise the scheduler of the processing framework cannot decide at what point in time a particular VM is no longer needed and deallocate it. The Map Reduce pattern is a good example of an unsuitable paradigm here: Although at the end of a job only few reducer tasks may still be running, it is not possible to shut down the idle VMs, since it is unclear if they contain intermediate results which are still required. Finally, the scheduler of such a processing framework must be able to determine which task of a job should be executed on which type of VM and, possibly, how many of those. This information could be either provided externally, e.g. as an annotation to the job description, or deduced internally, e.g. from collected statistics, similarly to the way database systems try to optimize their execution schedule over
time. 2.2 Challenges In a cluster the compute nodes are typically interconnected through a physical high-performance network. The topology of the network, i.e. the way the compute nodes are physically wired to each other, is usually well-known and, what is more important, does not change over time. Current data processing frameworks offer to leverage this knowledge about the network hierarchy and attempt to schedule tasks on compute nodes so that data sent from one node to the other has to traverse as few network switches as possible. That way network bottlenecks can be avoided and the overall throughput of the cluster can be improved.In a cloud this topology information is typically not exposed to the customer. Since the nodes involved in processing a data intensive job often have to transfer tremendous amounts of data through the network, this drawback is particularly severe; parts of the network may become congested while others are essentially unutilized. It is unclear if these techniques are applicable to IaaS clouds. For security reasons clouds often incorporate network virtualization techniques which can hamper the inference process, in particular when based on latency measurements.Even if it was possible to determine the underlying network hierarchy in a cloud and use it for topology-aware scheduling, the obtained information would not necessarily remain valid for the entire processing time. VMs may be migrated for administrative purposes between different locations inside the data center without any notication, rendering any previous knowledge of the relevant network infrastructure obsolete. As a result, the only way to ensure locality between tasks of a processing job is currently to execute these tasks on the same VM in the cloud. This may involve allocating fewer, but more powerful VMs with multiple CPU cores. E.g., consider an aggregation task receiving data from seven generator tasks. Data locality can be ensured by scheduling these tasks to run on a VM with eight cores instead of eight distinct
single-core machines. However, currently no data processing framework includes such strategies in its scheduling algorithms. 3. Characterstics and System Model Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction. The term "cloud" is used as a metaphor for the Internet, based on the cloud drawing used in the past to represent the telephone network, and later to depict the Internet in computer network diagrams as an abstraction of the underlying infrastructure it represents. Typical cloud computing providers deliver common business applications online that are accessed from another Web service or software like a Web browser, while the software and data are stored on servers. Essential Characteristics: On-demand self-service is a consumer can unilaterally provision computing capabilities, such as server time and network storage, as needed automatically without requiring human interaction with each services provider. Broad network access: Capabilities are available over the network and accessed through standard mechanisms that promote use by heterogeneous thin or thick client platforms (e.g., mobile phones, laptops, and personal digital assistants (PDAs)).Resource pooling: The providers computing resources are pooled to serve multiple consumers using a multi-tenant model, with different physical and virtual resources dynamically assigned and reassigned according to consumer demand. There is a sense of location independence in that the subscriber generally has no control or knowledge over the exact location of the provided resources but may be able to specify location at a higher level of abstraction (e.g., country, state, or datacenter). Examples of resources include storage, processing, memory, network bandwidth, and virtual
machines. Rapid elasticity: Capabilities can be rapidly and elastically provisioned, in some cases automatically, to quickly scale out and rapidly released to quickly scale in. To the consumer, the capabilities available for provisioning often appear to be unlimited and can be purchased in any quantity at any time. Measured Service: Cloud systems automatically control and optimize resource use by leveraging a metering capability at some level of abstraction appropriate to the type of service (e.g., storage, processing, bandwidth, and active user accounts). [4]Resource usage can be monitored, controlled, and reported providing transparency for both the provider and consumer of the utilized service. Service Models: Cloud Software as a Service (SaaS): The capability provided to the consumer is to use the providers applications running on a cloud infrastructure. The applications are accessible from various client devices through a thin client interface such as a Web browser (e.g., Web-based email). The consumer does not manage or control the underlying cloud infrastructure including network, servers, operating systems, storage, or even individual application capabilities, with the possible exception of limited user specific application configuration settings. Cloud Platform as a Service (PaaS): The capability provided to the consumer is to deploy onto the cloud infrastructure consumer-created or -acquired applications created using programming languages and tools supported by the provider. The consumer does not manage or control the underlying cloud infrastructure including network, servers, operating systems, or storage, but has control over the deployed applications and possibly application hosting environment configurations. Cloud Infrastructure as a Service (IaaS): The capability provided to the consumer is to provision processing, storage, networks, and other fundamental computing resources where the consumer is able to deploy and run arbitrary software, which can include operating systems and applications. The consumer does not manage or control the underlying cloud infrastructure but has
control over operating systems; storage, deployed applications, and possibly limited control of select networking components (e.g., host firewalls)[5].Deployment Models: Private cloud-The cloud infrastructure is operated solely for an organization. It may be managed by the organization or a third party and may exist on premise or off premise. Community cloud-The cloud infrastructure is shared by several organizations and supports a specific community that has shared concerns (e.g., mission, security requirements, policy, and compliance considerations). It may be managed by the organizations or a third party and may exist on premise or off premise. Public cloud: The cloud infrastructure is made available to the general public or a large industry group and is owned by an organization selling cloud services. Hybrid cloud: The cloud infrastructure is a composition of two or more clouds (private, community, or public) that remain unique entities but are bound together by standardized or proprietary technology that enables data and application portability (e.g., cloud bursting for load-balancing between clouds).The processing frameworks in cloud environment are designed to be static currently, homogeneous cluster setups and disregard the particular nature of a cloud. Consequently, the allocated compute resources may be inadequate for big parts of the submitted job and unnecessarily increase processing time and cost. In this an efficient parallel data processing in clouds are designed. Nephele is the first data processing framework to explicitly exploit the dynamic resource allocation offered by todays IaaS clouds for both, task scheduling and execution. Due to the dynamic nature of cloud more than one user can access the resource at a time. Also more users can store their documents in a cloud resource hence the security is important problem. The hackers can hack the password if it is in textual format so an alternative approach is used here to overcome it. A graphical password methodology is introduced to protect the file
from third person. The recovery can be done by sound system.[6] 4. System Models I propose cloud computing as a solution for this ever incrementing resource needs. As per the proposed scenario the resources in possession of a non-working or a less loaded computer can be used from anywhere in the home network. Cloud computing describes computation, software, data access, and storage services that do not require end-user knowledge of the physical location and configuration of the system that delivers the services. Parallels to this concept can be drawn with the electricity grid where end-users consume power resources without any necessary understanding of the component devices in the grid required to provide the service. All the above can be one by dynamic behavior of the cloud. Due to the dynamic nature of cloud more than one user can access the resource at a time. Also more Design
Fig 2.1 Cloud Structure A dedicated server is mandatory for the proposed system. The server is responsible for initializing, coordinating, users can store their documents in a cloud resource hence the security is
important problem. The hackers can hack the password if it is in textual format so an alternative approach is used here to overcome it. A graphical password methodology is introduced to protect the file from third person. The recovery can be done by sound system.splitting, joining, zipping, and unzipping all the nephele workloads. The working of nephele server is more often as that of a router. Sever is responsible for maintaining and invoking all communication processes.The steps the server performs to keep the client list updated are. Add New Client:The systems agreeing to perform the tasks instructed by the server are added to the client list. The systems are identified for communication by means of their corresponding IPs.2.Delete Client-Removes a particular client from a list of resource clients. The process includes deleting the system IP from the database and blocking communication channels to the respective system.3.Edit Client-To edit the client details like IP and port of a particular client. Due to the fact that assigned IP addresses are dynamic in nature it is possible that the IP of a system to change. In this scenario, the changes can be upgraded using this provision.4.View Client-To view the list of available clients in the Nephele. environment. The list shown will be that of all registered clients.Network:The communication channel required by both clients and server is maintained using this module. The module makes separate class file for sending files and strings. All communications are made by using IP, port combination. For the success of project a dedicated network of at least 20 Mbps connectivity is expected. The time advantage of the project depends solely on the network efficiency .This includes, Client Communication, Server Communication. Client Communication Includes sending username and password to server as a string for login and registration purposes. The other class defined on this module is entirely maintained for sending files.Server Communication Includes channels for maintaining communication with clients. The same method allows the system to detect if any client is not ready to take a
task.Client:Registration and Login For registration and login of client computers. Registration is done with datas including username, password, IP, and port number. The data will then be forwarded to the server which in turn will do the insertion and updating tasks to the database.For invoking a task and for giving the nephele client the output. The task can be compression or decompression and the client can also choose to be a resource in the nephele environment. The addition of operation flag indication is also the sole responsibility of this module. 5. Architecture Design Nepheles architecture follows a classic master-worker pattern. Before submitting a Nephele compute job, a user must start a VM in the cloud which runs the so called Job Manager (JM). The Job Manager receives the clients jobs, is responsible for scheduling them, and coordinates their execution. It is capable of communicating with the interface. The cloud operator provides to control the instantiation of VMs. We call this interface the Cloud Controller. By means of the Cloud Controller the Job Manager can allocate or deallocate VMs according to the current job execution phase. These comply with common Cloud computing terminology and refer to these VMs as instances. The term instance type will be used to differentiate between VMs with different hardware characteristics.
Fig 2.2 Nephele Architecture The actual execution of tasks which a Nephele job consists of is carried out by a set of instances. Each instance runs a so-called Task Manager (TM). A Task Manager receives one or more tasks from the Job Manager at a time, executes them, and after that informs the Job Manager about their completion or possible errors. Upon job reception the Job Manager then decides, depending on the jobs particular tasks, how many and what type of instances the job should be executed on, and when the respective instances must be allocated/deallocated to ensure a continuous but cost-efficient processing.The newly allocated instances boot up with a previously compiled VM image. The image is configured to automatically start a Task Manager and register it with the Job Manager. Once all the necessary Task Managers have successfully contacted the Job Manager, it triggers the execution of the scheduled job.Nephele are expressed as a directed acyclic graph (DAG).[6] Each vertex in the graph represents a task of the overall processing job, the graphs edges define the communication flow between these tasks. Integration of Sound Signature in Graphical Password Authentication System: Here a graphical password system with a supportive sound signature to increase the remembrance of the password is discussed. In proposed work a click-based graphical password scheme called Cued Click Points (CCP) is presented. In this system a password consists of sequence of some images in which user can select one click-point per image. In addition user is asked to select a sound signature corresponding to each click point this sound signature will be used to help the user in recalling the click point on an image. System showed very good Performance in terms of speed, accuracy, and ease of use. Users preferred CCP to Pass Points, saying that selecting and remembering only
one point per image was easier and sound signature helps considerably in recalling the click points. Image Click points I1 (123,678) I2 (176,134) I3 (450,297) I4 (761,164)
System Tolerance After creation of the login vector, system calculates the Euclidian distance between login vector and profile vectors stored. Euclidian distance between two vectors p and q is given by-
In cloud environment due to dynamic access of resources security is an important issue. So the text based password may not be too efficient to provide security. Hence we need a graphical method to provide security and it can be recovered by using sound in case of lost. 6.Job Description The DAGs allow tasks to have multiple input and multiple output edges. This simplies the implementation of classic data combining functions like, e.g., join operations. The DAGs edges explicitly model the communication paths of the processing job. As long as the particular tasks only exchange data through these designated communication edges,
Nephele can always keep track of what instance might still require data from what other instances and which instance can potentially be shut down and deallocated. Dening a Nephele job comprises three mandatory. steps: First, the user must write the program code for each task of his processing job or select it from an external library. Second, the task program must be assigned to a vertex. Third, the vertices must be connected by edges to dene the communication paths of the job.Tasks are expected to contain sequential code and process so-called records, the primary data unit in Nephele.Programmers can dene arbitrary types of records. From a programmers perspective records enter and leave the task program through input or output gates. Those input and output gates can be considered endpoints of the DAGs edges which are dened in the following step. Regular tasks (i.e. tasks which are later assigned to inner vertices of the DAG) must have at least one or more input and output gates. Contrary to that, tasks which either represent the source or the sink of the data ow must not have input or output gates, respectively. After having specied the code for the particular tasks of the job, the user must dene the DAG to connect these tasks. We call this DAG the Job Graph. The Job Graph maps each task to a vertex and determines the communication paths between them. The number of a vertexs incoming and outgoing edges must thereby comply with the number of input and output gates dened inside the tasks. In addition to the task to execute, input and output vertices (i.e. vertices with either no incoming or outgoing edge) can be associated with a URL pointing to external storage facilities to read or write input or output data, respectively.[7] Parallelization and Scheduling Strategies:As mentioned before, constructing an Execution Graph from a users submitted Job Graph may leave different degrees of freedom to Nephele. Using this freedom to construct the most efcient Execution Graph (in terms of processing time or monetary cost) is currently a major focus of our research. Discussing this subject in detail would go beyond the scope of this paper.
However, we want to outline our basic approaches in this subsection:Unless the user provides any job annotation which contains more specic instructions we currently pursue a simple default strategy: Each vertex of the Job Graph is transformed into one Execution Vertex. The default channel types are network channels. Each Execution Vertex is by default assigned to its own Execution Instance unless the users annotations or other scheduling restrictions (e.g. the usage of in-memory channels) prohibit it. The default instance type to be used is the one with the lowest price per time unit available in the IaaS cloud. One fundamental idea to rene the scheduling strategy for recurring jobs is to use feedback data. We developed a proling subsystem for Nephele which can continuously monitor running tasks and the underlying instances. Based on the Java Management Extensions (JMX) the proling subsystem is, among other things,capable of breaking down what percentage of its processing time a task thread actually spends processing user code and what percentage of time it has to wait for data. With the collected data Nephele is able to detect both computational as well as I/O bottlenecks. While computational bottlenecks suggest a higher degree of parallelization for the affected tasks, I/O bottlenecks provide hints to switch to faster channel types (like in memory channels) and reconsider the instance assignment. Since Nephele calculates a cryptographic signature for each task, recurring tasks can be identied and the previously recorded feedback data can be exploited.At the moment we only use the proling data to detect these bottlenecks and help the user to choose reasonable annotations for his job. Figure 4 illustrates the graphical job viewer we have devised for that purpose. It provides immediate visual feedback about the current utilization of all tasks and cloud instances involved in the computation. A user can utilize this visual feedback to improve his job annotations for upcoming job executions. In more advanced versions of Nephele we envision the system to automatically adapt to detected bottlenecks, either between
consecutive executions of the same job or even during job execution at runtime.While the allocation time of cloud instances is determined by the start times of the assigned subtasks, there are different possible strategies for instance deallocation.In order to reect the fact that most cloud providers charge their customers for instance usage by the hour, we integrated the possibility to reuse instances. Nephele can keep track of the instances allocation times. An instance of a particular type which has become obsolete in the current Execution Stage is not immediately deallocated if an instance of the same type is required in an upcoming Execution Stage[8]. Instead, Nephele keeps the instance allocated until the end of its current lease period. If the next Execution Stage has begun before the end of that period, it is reassigned to an Execution Vertex of that stage, otherwise it deallocated early enough not to cause any additional cost.Besides the use of feedback data we recently complemented our efforts to provide reasonable job annotations automatically by a higherlevel programming model layered on top of Nephele. 6.EVALUATION DAG and Nephele: In this experiment we were no longer bound to be MapReduce processing pattern. Instead, we implemented the sort/aggregation problem as a DAG and tried to exploit Nepheles ability to manage heterogeneous compute resources.several powerful but expensive instances are used to 8 determine the 210 smallest integer numbers in parallel ,while, after that, a single inexpensive instance is utilized for the nal aggregation. The graph contained ve distinct tasks, again split into different groups of subtasks. However, in contrast to the previous experiment, this one also involved instances of different types. In order to feed the initial data from HDFS into Nephele, we reused the Big Integer Reader task. The records emitted by the Big Integer Reader subtasks were received by the second task, Big Integer Sorter, which attempted to buffer all incoming records into main memory. Once it had received all designated
records, it performed an in-memory quick sort and subsequently continued to emit the records in an orderpreserving manner.[9] Since the BigIntegerSorter task requires large amounts of main memory we split it into 146 sub-tasks and assigned these evenly to six instances of type c1.xlarge. The preceding BigIntegerReader task was also split into 146 subtasks and set up to emit records via inmemory channels. The third task, Big Integer Merger, received records from multiple input channels. Once it has read a record from all available input channels, it sorts the records locally and always emits the smallest number. The Big Integer Merger tasks occured three times in a row in the Execution Graph. The rst time it was split into six sub-tasks, one subtask assigned to each of the six x1.xlarge instances. This is currently the only way to ensure data locality between the sort and merge tasks. The second time the BigIntegerMerger task occured in the Execution Graph, it was split into two subtasks. These two subtasks were assigned to two of the previously used x1.xlarge instances. The third occurrence of the task was assigned to new instance of the type y1.small.Since we abandoned the MapReduce processing pattern, we were able to better exploit Nepheles streaming pipelining characteristics in this experiment. Consequently, each of the merge subtasks was congured to stop execution after having emitted 8 2 10 records. The stop command was propagated to all preceding subtasks of the processing chain, which allowed the Execution Stage to be interrupted as soon as the nal merge subtask had emitted the 2 8 10 smallest records. The fourth task, BigIntegerAggregater, read the incoming records from its input channels and summed them up. It was also assigned to the single y1.small instance. Since we no longer required the six x1.xlarge instances to run once the nal merge subtask had determined the 8 10 smallest numbers, we changed the communication channel between the nal BigIntegerMerger and BigIntegerAggregater subtask to a le channel. That way Nephele pushed the aggregation into the next
Execution Stage and was able to deallocate the expensive instances. Finally, the fth task, BigIntegerWriter, eventually received the calculated 8 average of the 2 10 integer numbers and wrote the value back to HDFS.
7.Conclusion and Future Enhancement

Nephele is the first data processing framework to explicitly exploit the dynamic resource allocation offered by todays IaaS clouds for both, task scheduling and execution. Particular tasks of a processing job can be assigned to different types of virtual machines which are automatically instantiated and terminated during the job execution. Based on this new framework, we perform extended evaluations of MapReduce-inspired processing jobs on an IaaS cloud system. The dynamic allocation of resources provides high performance processing power and it allows the user to obtain the resources dynamically. In the second phase I would like to give security by the Integration of Sound Signature in Graphical Password Authentication System which is performed in cloud. Future: Nephele compute job, a user must start a VM in the cloud which runs the so called Job Manager (JM). After having received a valid Job Graph from the user, Nepheles Job Manager transforms it into a so-called Execution Graph. An Execution Graph is Nepheles primary data structure for scheduling and monitoring the execution of a Nephele job. In contrast to the Job Graph, an Execution Graph is no longer a pure DAG. The DAG constructed in present work is off without feedback loop, so in future work DAG can be constructed with feedback loop. Also in providing security, In future systems other patterns may be used for recalling purpose like touch of smells.
References
[1] AmazonWeb Services LLC.Amazon Elastic Compute Cloud (Amazon EC2). http://aws.amazon.com/ec2/, 2009.
[2] Amazon Web Services LLC. Amazon Elastic MapReduce. http://aws.amazon.com/elasticmapreduce/, 2009 [3] Amazon Web Services LLC. Amazon Simple Storage Service. http: //aws.amazon.com/s3/, 2009. [4] D. Battre, S. Ewen, F. Hueske, O. Kao, V. Markl, and D. Warneke. Nephele/PACTs: A Programming Model and Execution Frame-work for Web-Scale Analytical Processing.In SoCC 10: Proceedings of the ACM Symposium on Cloud Computing 2010, pages 119130, New York, NY, USA, 2010. ACM. [5] R. Chaiken, B. Jenkins, P.-A. Larson, B. Ramsey, D. Shakib,S. Weaver, and J. Zhou. SCOPE: Easy and Efcient Parallel Processing of Massive Data Sets. 1(2):12651276, 2008. [6] H. chih Yang, A. Dasdan, R.-L. Hsiao, and D. S. Parker. Map-Reduce-Merge: Simplied Relational Data Processing on Large Clusters. In SIGMOD 07: Proceedings of the 2007 ACM SIGMOD international conference on Management of data, pages 10291040,New York, NY, USA, 2007. ACM. [7] M. Coates, R. Castro, R. Nowak, M. Gadhiok, R. King, and Y. Tsang. Maximum Likelihood Network Topology Identication from Edge-Based Unicast Measurements. SIGMETRICS Perform. Eval. Rev., 30(1):1120, 2002. Proc.VLDBEndow.,
[8] R. Davoli. VDE: Virtual Distributed Ethernet. Testbeds and Research Infrastructures for the Development of Networks & Communities, International Conference on, 0:213220, 2005. [9] J. Dean and S. Ghemawat. MapReduce: Simplied Data Processing on Large Clusters. In OSDI04: Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation, pages 1010, Berkeley, CA, USA, 2004. USENIX Association. [10] E. Deelman, G. Singh, M.-H. Su, J. Blythe, Y. Gil, C. Kesselman,G. Mehta, K. Vahi, G. B. Berriman, J. Good, A. Laity, J. C. Jacob,and D. S. Katz. Pegasus: A Framework for Mapping Complex Scientic Workows onto Distributed Systems. Sci. Program.13(3) :219237, 2005.
[11] T. Dornemann, E. Juhnke, and B. Freisleben. On-Demand Resource Provisioning for BPEL Workows Using Amazons Elastic Compute Cloud. of the 2009 In CCGRID 09: Proceedings 9th IEEE/ACM International
Symposium on Cluster Computing and the Grid, pages 140147, Washington, DC, USA, 2009. IEEE Computer Society. [12] I. Foster and C. Kesselman. Globus: A Metacomputing Infrastructure Toolkit. Intl.Journal of Supercomputer Applications, 11(2):115128, 1997.
TO A SECURED AND RELIABLE STORAGE SERVICES INCLOUD COMPUTING

P.Savaridassan1,G.Gnanasambantham, I.Nesamani,V.Vinothkanna2
1
Asst Professor, 2Final year, B.Tech(IT Dept), Sri ManakulaVinayagar Engineering College, Madagadipet
Abstract Cloud storage enables users to remotely store their data and enjoy the on-demand high quality cloud applications without the burden of local hardware and software management. Though the benets are clear, such a service is also relinquishing users physical possession of their outsourced data, which inevitably poses new security risks towards the correctness of the data in cloud. In order to address this new problem and further achieve a secure and dependable cloud storage service, we propose in this paper a exible distributed storage integrity auditing mechanism, utilizing the homomorphic token and distributed erasure-coded data. The proposed design allows users to audit the cloud storage with very lightweight communication and computation cost. The auditing result not only ensures strong cloud storage correctness guarantee, but also simultaneously achieves fast data error localization, i.e., the identication of misbehaving server. Considering the cloud data are dynamic in nature, the proposed design further supports secure and efcient dynamic operations on outsourced data, including block modication, deletion, and append. Analysis shows the proposed scheme is highly efcient and resilient against Byzantine failure, malicious data modication attack, and even server colluding attacks.
9. Introduction Several trends are opening up the era of Cloud Computing, which is an Internet-based development and use of computer technology. The ever cheaper and more powerful processors, together with the software as a service (SaaS) computing architecture, are transforming data centers into pools of computing service on a huge scale. The increasing network bandwidth and reliable yet flexible network connections make it even possible that users can now subscribe high quality services from data and software that reside solely on remote data centers. Moving data into the cloud offers great convenience to users since they dont have to care about the complexities of direct hardware management. The pioneer of Cloud Computing vendors, Amazon Simple Storage Service (S3) and Amazon Elastic Compute Cloud (EC2) are both wellknown examples. While these internet-based online services do provide huge amounts of storage space and customizable computing resources, this computing
platform shift, however, is eliminating the responsibility of local machines for data maintenance at the same time. As a result, users are at the mercy of their cloud service providers for the availability and integrity of their data. In Steganography, a transform domain technique called DCT, which is used to conceal messages in considerable areas of the cover image. On the one hand, although the cloud infrastructures are much more powerful and reliable than personal computing devices, broad range of both internal and external threats for data integrity still exist. Examples of outages and data loss incidents of noteworthy cloud storage services appear from time to time. On the other hand, since users may not retain a local copy of outsourced data, there exist various incentives for cloud service providers (CSP) to behave unfaithfully towards the cloud users regarding the status of their outsourced data. For example, to increase the profit margin by reducing cost, it is possible for CSP to discard rarely accessed data without being detected in a timely fashion.
Similarly, CSP may even attempt to hide data loss incidents so as to maintain a reputation. Therefore, although outsourcing data into the cloud is economically attractive for the cost and complexity of long-term large-scale data storage, it is lacking of offering strong assurance of data integrity and availability may impede its wide adoption by both enterprise and individual cloud users. In order to achieve the assurances of cloud data integrity and availability and enforce the quality of cloud storage service, efficient methods that enable on-demand data correctness verification on behalf of cloud users have to be designed. However, the fact that users no longer have physical possession of data in the cloud prohibits the direct adoption of traditional cryptographic primitives for the purpose of data integrity protection. Hence, the verification of cloud storage correctness must be conducted without explicit knowledge of the whole data files. Meanwhile, cloud storage is not just a third party data warehouse. In this paper, we propose an effective and exible distributed storage verication scheme with explicit dynamic data support to ensure the correctness and availability of users data in the cloud. We rely on erasure correcting code in the le distribution preparation to provide redundancies and guarantee the data dependability against Byzantine servers, where a storage server may fail in arbitrary ways. This construction drastically reduces the communication and storage overhead as compared to the traditional replication-based le distribution techniques. By utilizing the homomorphic token with distributed verication of erasure-coded data, our scheme achieves the storage correctness insurance as well as data error localization: whenever data corruption has been detected during the storage correctness verication, our scheme can almost guarantee the simultaneous localization of data errors, i.e., the identication of them is behaving server(s). In order to strike a good balance between error resilience and data dynamics, we further explore the algebraic property of our token computation and erasure-coded data, and demonstrate how to efciently support dynamic operation on data blocks, while maintaining the same level of storage correctness assurance. In order to save the time, computation resources, and even the related
online burden of users, we also provide the extension of the proposed main scheme to support third-party auditing, where users can safely delegate the integrity checking tasks to third party auditors and are worryfree to use the cloud storage services. Our work is among the rst few ones in this eld to consider distributed data storage security in Cloud Computing. Our contribution can be summarized as the following three aspects: 1) Compared to many of its predecessors, which only provide binary results about the storage status across the distributed servers, the proposed scheme achieves the integration of storage correctness insurance and data error localization, i.e., the identication of misbehaving server(s). 2) Unlike most prior works for ensuring remote data integrity, the new scheme further supports secure and efcient dynamic operations on data blocks, including: update, delete and append. 3) The experiment results demonstrate the proposed scheme is highly efcient. Extensive security analysis shows our scheme is resilient against Byzantine failure, malicious data modication attack, and even server colluding attacks. 10. PROBLEM DEFINITION 2.1 System Model Representative network architecture for cloud storage service architecture is illustrated in Figure 1. Three different network entities can be identified as follows: 1. User: an entity, who has data to be stored in the cloud and relies on the cloud for data storage and computation, can be either enterprise or individual customers. 2. Cloud Server (CS): an entity, which is managed by cloud service provider (CSP) to provide data storage service and has significant storage space and computation resources (we will not differentiate CS and CSP hereafter.). 3. Third Party Auditor (TPA): an optional TPA, who has expertise and capabilities that users may not have, is trusted to assess and expose risk of cloud storage services on behalf of the users upon request. In cloud data storage, a user stores his data through a CSP into a set of cloud servers, which are
running in a simultaneous, cooperated and distributed manner. Data redundancy can be employed with technique of erasure correcting code to further tolerate faults or server crash as users data grows in size and importance. Thereafter, for application purposes, the user interacts with the cloud servers via CSP to access or retrieve his data. In some cases, the user may need to perform block level operations on his data. The most general forms of these operations we are considering are block update, delete, insert and append. Note that in this paper, we put more focus on the support of file-oriented cloud applications other than non-file application data, such as social networking data. In other words, the cloud data we are considering is not expected to be rapidly changing in a relative short period. As users no longer possess their data locally, it is of critical importance to ensure users that their data are being correctly stored and maintained. That is, user should be equipped with security means so that they can make continuous correctness assurance (to enforce cloud storage service-level agreement) of their stored data even without the existence of local copies. In case that users do not necessarily have the time, feasibility or resources to monitor their data online, they can delegate the data auditing tasks to an optional trusted TPA of their respective choices. However, to securely introduce such a TPA, any possible leakage of users outsourced data towards TPA through the auditing protocol should be prohibited. In our model, we assume that the pointto-point communication channels between each cloud server and the user is authenticated and reliable, which can be achieved in practice with little overhead. 2.2.Adversary model From users perspective, the adversary model has to capture all kinds of threats towards his cloud data integrity. Because cloud data do not reside at users local site but at CSPs address domain, these threats can come from two different sources: internal and external attacks. For internal attacks, a CSP can be self-interested, untrusted and possibly malicious. Not only does it desire to move. data that has not been or is rarely accessed to a lower tier of storage than agreed for monetary reasons, but it may also attempt to hide a data loss incident due to management errors,
Byzantine failures and so on. For external attacks, data integrity threats may come from out siders who are beyond the control domain of CSP, for example, the economically motivated attackers. They may compromise a number of cloud data storage servers in different time intervals and subsequently be able to modify or delete users data while remaining undetected by CSP. 2.3. Design Goals To ensure the security and dependability for cloud data storage under the aforementioned adversary model, we aim to design efcient mechanisms for dynamic data verication and operation and achieve the following goals: (1) Storage correctness: to ensure users that their data are indeed stored appropriately and kept intact all the time in the cloud. (2) Fast localization of data error: to effectively locate the malfunctioning server when data corruption has been detected. (3) Dynamic data support: to maintain the same level of storage correctness assurance even if users modify, delete or append their data les in the cloud. (4) Dependability: to enhance data availability against Byzantine failures, malicious data modication and server colluding attacks, i.e. minimizing the effect brought by data errors or server failures. (5) Lightweight: to enable users to perform storage correctness checks with minimum overhead. Proposed Architecture
Fig. 1: Cloud storage service architecture
11. ENSURING CLOUD DATA STORAGE a. File Distribution Preparation It is well known that erasure-correcting code may be used to tolerate multiple failures in distributed storage systems. In cloud data storage, we rely on this technique to disperse the data le F redundantly across a set of n = m + k distributed servers. An (m,k) ReedSolomon erasure-correcting code is used to create k redundancy parity vectors from m data vectors in such a way that the original m data vectors can be reconstructed from any m out of the m+k data and parity vectors. By placing each of the m+k vectors on a different server, the original data le can survive the failure of any k of the m +k servers without any data loss, with a space overhead of k/m. For support of efcient sequential I/O to the original le, our le layout is systematic, i.e., the unmodied m data le vectors together with k parity vectors is distributed across m + k different servers b. Challenge Token Pre-computation In order to achieve assurance of data storage correctness and data error localization simultaneously, our scheme entirely relies on the pre-computed verication tokens. The main idea is as follows: before le distribution the user pre-computes a certain number of short verication tokens on individual vector G(j) (j {1,...,n}), each token covering a random subset of data blocks. Later, when the user wants to make sure the storage correctness for the data in the cloud, he challenges the cloud servers with a set of randomly generated block indices. Upon receiving challenge, each cloud server computes a short signature over the specied blocks and returns them to the user. The values of these signatures should match the corresponding tokens pre-computed by the user. Meanwhile, as all servers operate over the same subset of the indices, the requested response values for integrity check must also be a valid code word determined by secret matrix P. 3.4 File Retrieval and Error Recovery Since our layout of le matrix is systematic, the user can reconstruct the original le by downloading the data vectors from the rst m servers, assuming that they return the correct response values.
Notice that our verication scheme is based on random spot-checking, so the storage correctness assurance is a probabilistic one. However, by choosing system parameters appropriately and conducting enough times of verication, we can guarantee the successful le retrieval with high probability. On the other hand, whenever the data corruption 3.5TowardsThird Party Auditing As discussed in our architecture, in case the user does not have the time, feasibility or resources to perform the storage correctness verication, he can optionally delegate this task to an dependent third party auditor, making the cloud storage publicly veriable. However, as pointed out by the recent work , to securely introduce an effective TPA, the auditing process should bring in no new vulnerabilities towards user data privacy. Namely, TPA should not learn users data content through the delegated data auditing. Now we show that with only slight modication, our protocol can support privacy preserving third party auditing. The new design is based on the observation of linear property of the parity vector blinding process. Recall that the reason of blinding process is for protection of the secret matrix P against cloud servers. However, this can be achieved either by blinding the parity vector or by blinding the data vector (we assume k < m). Thus, if we blind data vector before le distribution encoding, then the storage verication task can be successfully delegated to third party auditing in a privacy-preserving manner. 12. References: [1] C. Wang, Q. Wang, K. Ren, and W.ou, Ensuring data storage security in cloud computing, in Proc. of IWQoS09, July 2009, pp.19. [2]Amazon.com,Amazon web services Online at http://aws.amazon.com/, 2009. (aws),
[3]Sun Microsystems, Inc.,Building customer trust in cloud computing with transparent https://www.sun.com/offers/details/sun transparency.xml, November 2009.
[4]M.Arrington, Gmail disaster: Reports of mass http://www.techcrunch.com/2006/12/28/gmaildisasterreports-of-mass-email-deletions/December 2006. [5]J.Kincaid,MediaMax/TheLinkup loses Its http://www.techcrunch.com/2008/07/10/mediamaxthel inkup-closes-its-doors/, July 2008. [6]S.Wilson,Appengine outage, Online http://www.cio-weblog.com/50226711/appengine outage.php, June 2008 at
IDENTIFICATION AND DETECTION OF PHARAM ATTACK USING BAYESIAN APPROACH WITH SMS ALERT
Mr.K.Karnavel, A.Akilan1, M.Gobi2 , J.Aravind Raj3
1, 2,3
Department of CSE , ANAND INSTITUTE OF HIGHER TECHNOLOGY , Chennai.
Abstract Pharming attack is a sophisticated version of phishing attack that aims to steal users credentials by redirecting them to a fraudulent website using DNS- based technique. To prevent pharam attack text classifier and visual classifier are applied to differentiate actual webpage from the suspicious one. Pharming attacks can be performed at the client-side and attacks are often imperceptible to the user. To detect the phishing sites effectively both the URL Validation as well as webpage content analysis is performed at the client side.URL validation involves comparing IP address obtained from default and third party DNS servers. Domain name from the URL is checked for registration in Top Level Domain database using WHOIS protocol. Web content analysis involves analyzing both the text content and visual content of the webpage. NaiveBayes rule is used in text classifier which compares the text content of the webpage with the spam word repository and image classifier uses the whole image of the webpage to do the process. Although the phishing sites can be detected, it is suggested to use session key authentication implanted on the server side for providing more security to the users sensitive information. Enter URL URL validation by WHOIS Text Classifier Image Classifier Threshold Validation Visiting web page Sending Session key to users mobile Session key Verification User Authentication https://login.yahoo.com/config/login_verify2?.intl=us&.src=ym Default DNS Third party DNS IP address check IP address Corresponding web pages Keywords Phishing, Classifiers, Sessionkey, DCT, NaiveBayes, I.INTRODUCTION Phishing attacks are a major concern for preserving Internet users privacy. By combining social engineering and website forgery techniques, phishing attacks spoof the identity of a company to trick Internet users to reveal confidential information such as login, password and credit card number. The perfect phishing attack is creating a website very similar to the legitimate one by using the same logos, images, structure, etc. However, if the user examines attentively the URL displayed in the address bar of the web browser, he should notice that the URL (especially the domain name) is not the usual one. Other kinds of phishing attacks i.e. the pharming attacks are much more complex to detect because both the visited URL and the website are similar to the legitimate site. Pharming attacks aim to corrupt DNS information to redirect users to a fraudulent website under the control of the attacker. DNS vulnerabilities can be exploited at the client-side by corrupting the user/company computer or the border router or at the server-side by intercepting, modifying or spoofing DNS exchanges as well as using content injection code techniques. As DNS Sec protocol is not fully deployed today over the whole Internet infrastructure to provide end-to-end secured DNS exchanges, we can hardly protect the user from DNS corruptions, especially for the attacks that occur in his own network. This framework is used to detect pharming attacks at the client-side. III.PROPOSED SYSTEM Pharming attacks can be prevented using the implementation of verification of Domain Name, IP Address, WhoIs standards and Web Content. Every Website should have registered those information would be founded using WhoIs Server. Our approach combines both an IP address check as well as a webpage content analysis using the information provided by multiple DNS servers. Text content analysis looks for the spam words in the web page. Visual content analysis measures the visual similarity between suspicious and legitimate site.
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) which cannot be used by the phishers even though they have matching username and password. If the user complaints about that he had received a session key but he never logged in, the corresponding IP address which used the customer credentials is added to the phishers database. Further requests from that IP will be blocked. IV.IMPLEMENTATION A. Validation of Website by WhoIs, IP address & Domain Domain and IP check The Domain name is extracted from the URL (example: Google from http://www.google.com).IP address for the corresponding domain name is retrieved from the default DNS server and also from the third party DNS server .The Webpage obtained from the IP address of the third party DNS is considered as the legitimate one. These IP addresses are compared. If equal, the application will proceed to the WHOIS validation else it will alert the user as phishing site. Validating site using WHOIS WHOIS is the standard or protocol that allows users to access various Top Level Domain databases (example: com, net, org).WHOIS is all about the website registration, name to whom the web site is registered, along with the company details. Using WHOIS, the domain name is checked whether it is registered under its corresponding top level domain or not .If it is registered, web content analysis will take place else the user will be notified that it is a phishing site. HTML tags are removed from the obtained web page and the remaining text contents are stored in a text file. The content of this text file is then compared with the repository of spam words that a phisher would use to attract user attention (example : free, earn million dollars in a week). NaiveBayes rule (Bayesian approach) is applied to measure the probability of how much the site can be phishing. Computing the probability Let's suppose the suspected webpage contains the word "Free". We know that this webpage is likely to be spam, more precisely a proposal to sell counterfeit copies of well-known brands of watches. The spam detection software, however, does not "know" such facts, all it can do is compute probabilities. The formula used by the software to determine that is derived from Bayes' theorem
where: Pr(S|W) is the probability that the webpage is phishing, knowing that the word "Free" is in it; Pr(S) is the overall probability that any obtained webpage is phishing; Pr(W|S) is the probability that the word "Free" appears in webpage; Pr(H) is the overall probability that any obtained webpage is not spam; Pr(W|H) is the probability that the word "Free" appears in the suspicious webpage. The spamicity of a word Recent statistics show that the current probability of any webpage being phishing is 80%, at the very least: Pr(S) = 0.8 ; Pr(H) = 0.2 However, most bayesian spam detection software makes the assumption that there is no a priori
Figure 2. Validating site using WHOIS B. Web Content Analysis Text Classification Organized by: Department of Computer Science and Engineering, Anand Institute of Higher Technology, Chennai. www.iconic12.in E-mail: iconic.aiht@gmail.com[Type text] Page 434
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) This quantity is called "spamicity" (or "spaminess") of the word "Free", and can be computed. The number Pr(W|S) used in this formula is approximated to the frequency of sites containing "Free" in the Webpages identified as phishing during the learning phase. Similarly, Pr(W|H) is approximated to the frequency of webpages containing "Free" in the webpages identified as spam during the learning phase. For these approximations to make sense, the set of learned webpages needs to be big and representative enough. It is also advisable that the learned set of webpages conforms to the 50% hypothesis about repartition between phishing and non-phishing. Of course, determining whether a webpage is phishing or non-phishing based only on the presence of the word "Free" is error-prone, which is why bayesian spam software tries to consider several words and combine their spamicities to determine a webpage's overall probability of being phishing. Image Classification Visual content refers to the characteristics with respect to the overall style, the layout, and the block regions including the logos, images, and forms. The web page is considered at the pixel level, i.e., an image that enables the total representation of the visual content of the web page and then it is compared for the visual similarity. The images are converted to greyscale images to reduce the computation and then normalized to low resolution image using Lanczos algorithm for robustness. The low resolution images are compared using the frequencies generated by Discrete Cosine Transform(DCT). This comparison returns the probability value of either 1(one) or 0(zero) to indicate that the suspicious and legitimate webpages are exactly similar or not. Discrete Cosine Transform In general, neighboring pixels within an image tend to be highly correlated. As such, it is desired to use an invertible transform to concentrate randomness into fewer, decorrelated parameters. The Discrete Cosine Transform (DCT) has been shown to be near optimal for a large class of images in energy concentration and decorrelating. It has been adopted in the JPEG and MPEG coding standards. The DCT decomposes the signal into underlying spatial frequencies, which then allow further processing techniques to reduce the precision of the DCT coefficients consistent with the Human Visual System (HVS) model. The DCT coefficients of an image tend themselves as a new feature, which have the ability to represent the regularity, complexity and some texture features of an image and it can be directly applied to image data in the compressed domain.This may be a way to solve the large storage space problem and the computational complexity of the existing methods. The two dimensional DCT can be written in terms of pixel values f(i, j) for i,j= 0,1,,N-1 and the frequency-domain transform coefficients F(u,v): The DCT tends to concentrate information, making it useful for image compression applications and also helping in minimizing feature vector size in CBIR. For full 2Dimensional DCT for an NxN image the number of multiplications required are N2(2N) and number of additions required are N2(2N-2) Figure 3.Image Classification
Threshold Validation The probability values obtained from the above process are used to determine whether the obtained site is phishing or not. For a phishing web site the text classifier should return value greater than 0.7 and the image classifier should return 0. C. Session key Authentication using SMS Though the phishing sites can be detected, to be more secure, session key authentication is to be implemented on the server side to prevent phishers from stealing users personal details. The user
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) should enter the matching username and password so that he/she will receive a User Bank server Login to the site and give the account details Validate the details and generate the session key User mobile Send the session key Client Phishing Bank site Original site Steals user credentials Apply the details to the original site for transaction Validate the details and send the token number to original user mobile Ask for token number session key to the registered mobile number. To proceed further, the session key should be entered. If the user reports to the administrator that he received a session key but he never logged in,then the corresponding IP address is tracked and added to the phishers database. When the phisher logs in from the same IP, the request will be blocked. administration explaining that the user received a session key without logging in to the site. Then the IP Address recorded for the corresponding session key will be blocked from any future access to the site. FUTURE ENHANCEMENTS Various Machine learning techniques can be merged with this model to detect the phishing sites in more efficient way. Some of the machines learning techniques are Bayesian Additive Regression Trees, Support Vector Machines, Logistic Regression and Random Forests. Web Crawlers can be used to obtain more promising results. REFERENCES [1] A. Y. Fu, W. Liu, and X. Deng. (Oct-Dec2006) Detecting phishing web pages with Visual similarity assessment based on earth movers distance (EMD) IEEE Trans.Depend. Secure Computing. [2] E. Medvet, E. Kirda, C. Kruegel.( 2008) Visual- Similarity-Based Phishing Detection Proceedings of the 4th international conference on Security and privacy in communication networks, Istanbul, Turkey. [3] R. Basnet, S. Mukkamala, and A. H. Sung.( 2008) Detection of phishing attacks: A machine learning approach in Soft Computing Applications in Industry, P. Bhanu, Eds. Berlin, Germany: Springer-Verlag. [4] Haijun Zhang, Gang Liu, Tommy W. S. Chow, and Wenyin Liu. (OCTOBER 2011) Textual and Visual Content-Based Anti-Phishing: Bayesian Approach, IEEE TRANSACTIONS ON NEURAL NETWORKS. [5] W. Liu, G. Huang, X. Liu, M. Zhang, and X. Deng.( May 2005) Detection of phishing web pages based on visual similarity in Proc. 14th International Conference of World Wide Web, Chiba, Japan. [6] Y. Zhang, J. Hong, and L. Cranor.( May 2007) CANTINA: A content-based approach to detecting phishing web sites in Proc. 16th International Conference of World Wide Web, Banff, Canada. [7]DChttp://www.cs.cf.ac.uk/Dave/Multimedia/no de231.html)
Session key generation on the server side Figure 4. Session key generation on the server side Phishers cannot do anything without session key Figure 5.Phishers without session key CONCLUSION A new content-based anti-phishing system has been developed. Based on the textual content, the text classifier is able to classify a given web page into corresponding categories as phishing or normal. the image classifier, which relies on DCT, is able to calculate the visual similarity between the given web page and the protected web page efficiently. A threshold is set to confirm that the site is phishing or normal. Session key authentication adds more security from the Server side to the privacy of the users. The IP Address of each user is stored in the database. In case a complaint is received by the
A Routing Algorithm based on Fuzzy logic in MANET

1,2
Udaya D1, Arulmozhi E2 Department of CSE, Dr. Pauls Engineering College, Pulichapallam
Abstract In the last decade the usage of Mobiles is rapidly very high. As the usage of mobile services is increasing so as the technology, the issues related to routing also increases. There is a constant challenge to provide reliable and high quality routing algorithm among these devices A Billion Dollar proposal for Mobile Adhoc Networks is there based on Fuzzy Logic for reliable networking. This paper proposes a Reliable Routing algorithm based on fuzzy Logic for finding a reliable path in mobile Adhoc networks. Using the quantifiable reliability value i.e. Lifetime of routes between nodes which is calculated using two parameters trust value and energy value. . The path which is having the stable reliability value is selected as the stable route from source to destination. These findings show that when the proposed algorithm is utilized the overall performance of the Adhoc network is significantly improved.
Transmission Power Controlled MAC Protocol for Ad Hoc Networks

C. Dhivyalakshmi Dr. Pauls Engineering College, Pulichapallam, Villupuram Dist. ABSTRACT In mobile ad hoc networks (MANETs), every node overhears every data transmission occurring in its vicinity and thus consumes energy unnecessarily. Transmission power control (TPC) has been extensively used not only to save energy, but also to improve the network throughput. In this paper, we propose an enhanced transmission power controlled protocol, ETPMAC, which can improve the network throughput significantly using a single channel and a single transceiver. ETPMAC can enable several concurrent transmissions without interfering with each other by controlling the transmission power. Moreover, it does not introduce any additional control overhead. We show by simulation that ETPMAC can improve the network throughput by up to 71% compared to IEEE 802.11 in a random topology.
A novel model for efficient data management in Electronic Toll Plaza with a centralized fixed machine for VANETs
1
Abstract
N. Arul Kumar 1, M. Govindaraj 2 PG Student(M.Tech., IT), 2Assistant Professor, Dept. of Computer Science, Bharathidasan University, Trichy-23
The mode of using wireless communication is referred as the Wireless network. The concept of existing cellular network and forming an Ad-hoc network are the two types of Wireless Communication approaches. The Ad-hoc network is formed by using the nodes with no pre-established for a temporary need. The system that makes communication vehicles (nodes) each other is Vehicular Ad-hoc Networks (VANETs). The routing protocol is used to route the information from one to another node so that the data can travel in safe and efficient method. In VANET, the vehicles could send data to others by using the GPS. There is no chance to find which node has sent the data. The false message is produced if a node is trying to send a malicious data. The aim of this paper is to present a novel model for efficient data management in Electronic Toll Plaza with a centralized fixed machine for Vehicular Ad-hoc Networks which provides a secure and trustworthy data transfer between communication nodes and to overcome the drawbacks of existing models.
1. INTRODUCTION The VANET is a growing technology that they deserve, in recent times, the awareness of the commercial and the research institutions. The Vehicular Communication (VC) makes the steps in the research area to enhance the security and the effectiveness of the communication systems, for example, the traffic status in the road. The main characteristics of the VANET are the infrastructure less network, such as the access point or base stations which is in the existing network. Each node can acts as a router or as a node to take part in the communication. 2. PROBLEM DEFINITION The nodes in the VANET are often mobile nodes and so they are easy to deploy and it has no infrastructure. We cannot able to find which vehicle has sent the information to another vehicle or GPS devices. If a vehicle sends false information to a GPS device, it works with that false message and provides erroneous information to other vehicles.
Figure 1: Nodes in Ad-hoc Network Here, in the above diagram, the outermost nodes are not within the transmitter range of each other. The middle nodes are used to forward packets between the outermost nodes. The middle node is acting as a router and the three nodes have formed an ad-hoc network. The proposed system will be an idea to design a a novel model for efficient data management in electronic toll plaza with a centralized fixed machine for VANETs which provide a safe and reliable data transfer between communication points and to overcome the drawbacks of other routing protocols. 3. REQUIREMENTS In general, the simulation components are the Nodes, Agents, and Links. The nodes are the participating objects within the simulation environment. The figure: 2 shows the general requirement for designing the model. Agents rely on nodes for specifying the traffic type. Links are used to specify the medium of connection i.e. wired or wireless between
the participating nodes. The figure: 3 shows the general framework design which is discussed in this paper.
and the next hop information is also stored inside the node. The source node which is ready to send the data will make a route request to the destination node. The Routing Response is received based on the Routing Request. The Responsing node may be either be destination node or any node in the network which help to data travel. 3.3 Digital Certificate Confirmation The Digital Certificate Confirmation (DCC) is done by the SIT (Secure Information Protocol) protocol. The certificates are given to the nodes after updating the information in their database. The best path is found by implementing the proposed on demand routing protocol. The Ticket Granting Machine is validates each node so that the unauthorized node may not can enter or transfer the data [3]. Any node which transfers the data will have information which is received in the form of a data packet. When the Data Packet is transferred the Digital Certificate is validated. The validation is also used to find the complete node details which are used to transfer the data. The information is transferred by using data service from one node to another node. 3.4 The Node E-Care Center Regular Node Discovery (RND) is the work of finding the node which subjects to E-Care Center (ECC). It will find out whether the nodes participated has the digital certificate [4]. The RND will reduce the time delay by collecting and updating the information regularly. It does not make the node to stop at the ECC. 3.5 Data Analysis Based On Routing The data analysis could be done based on the results obtained from the simulation environment. The network is traced to find the complete details. The Graphs will be plotted based on the following consideration with their corresponding value. The Loss Of Packet (LOP), Sending Node (SN) and Receiving Node (RN) and Consumption of Energy (CE) are parameters used [5]. The movement of the nodes inside the network is made to travel in any direction without any restriction. 4. CONCLUSION The proposed system could be used as the efficient model for VANET. It uses the concept of DCC to issue the digital certificate by updating the
Figure 2: General Requirements The Vehicle database has the complete details number of vehicles. The Router is used to route the information to the destination place. The Traffic status in the Route, Granting permission like Authorization and the Weather forecast are stored inside the Routing protocol [1]. The computer database is used to maintain the backend informations.
Figure 3: Framework Design 3.1 System design and Direction-finding The Network is formed by using the Nodes. The informations like the channel, type and size of the data, and the routing protocol are completely implemented inside each node [2]. The Routing information is stored inside each packet header, so that the efficient routing is formed. 3.2 AODV Routing Execution In Ad-hoc networks the Ad-hoc On Demand Distance Vector (AODV) is used a routing protocol. The use of AODV is used to find the correct destination and to find the efficient route to reach the destination. The routing information is not stored in all the nodes. The routing information is stored inside the each data packet header. When a data packet moves from one node to another, each node reads the data header and redirects the data to the correct direction. The distance
information in their database. The TGM is used to validation purposes. The RND is used to find out the nodes which subjects to ECC. The RND concept will reduce the traffic delay and the link failure which happens in the network. The data is shared between the nodes could be loosed due to the node disconnections from the network. The control mechanism could be implemented in order to handle the disconnected data. The transmission error could be reduced in the future work. The Performance of the designed protocol could be improved to reduce the Time delay [6]. The DCC could be used in future in order to find the more correct and efficient route between the nodes. REFERENCES [1] Eichler, Stephan, Ostermaier, Benedikt, Schroth, Christoph, Kosch, Timo, Simulation of Car-to-Car
Messaging: Analyzing the Impact on Road Traffic, IEEE Computer Society, 2005 [2] Yue Liu, Jun Bi, Ju Yang, Research on Vehicular Ad Hoc Networks, Chinese Control and Decision Conference (CCDC), 2009 [3] Sun Xi, Li Xia-miao, The Study of the Feasibility of VANET and its Routing Protocol, IEEE, 2008 [4] Jie Luo, Xinxing Gu, Tong Zhao, Wei Yan, A Mobile Infrastructure Based VANET Routing Protocol in the Urban Environment, IEEE, 2010 [5] Gongjun Yan, Nathalie Mitton, Xu Li, Reliable Routing in Vehicular Ad hoc Networks, IEEE, 2010 [6] Jing Zuo, Yuhao Wang, Yuan Liu, Yan Zhang, Performance Evaluation of Routing Protocol in VANET with Vehicle-node Density, IEEE, 2010
A STUDY ON QOS-AWARE SERVICE SELECTION IN SOA V.Varadharassu1, J.Ilanchezhian2, A.Ranjeeth3, K.Vignesh4 PG Students, Computer Science and Engineering, Sri Manakula Vinayagar Engineering College, Puducherry, ,
1, 2, 3, 4
Abstract Service Oriented Architecture (SOA) has become a new software development paradigm because it provides a flexible framework that can help reduce development cost and time. In order to maximize these benefits, many researchers have focused on providing quality of services (QoS) to consumers in a dynamic environment. In this paper, we discuss about the earlier service selection algorithms and comparing with the tree based algorithm that provide optimal services to consumers efficiently even when the status of selected services have changed. As a result, consumers can always use the optimal services of high qualities. Keyword Service Selection, Service Oriented Architecture (SOA), Binary tree algorithm. INTRODUCTION Service-Oriented Architecture (SOA) is a loosely coupled architecture designed to meet business needs of an organization. It is becoming a trend for system development and integration where systems group functionality around business processes. The fundamental building block of service-oriented architecture is a service. Services are software components that don't require developers to use a specific underlying technology. There are three types of components can be involved in a SOA system: 1.Service Provider, 2.Service Consumer and 3. Service Registry. The Service Provider is the component that makes a service available. Subsequently, the service has to be published in the network to become available to other services, using a description file encoded according to WSDL. The service provider sends to the Service Registry, also named Universal Description Discovery and Integration (UDDI), its own useful information to be published. Accordingly, the Service Registry maintains some information about every service, such as the URL, how the service can be invoked and all functionalities that the service is able to provide. When a Service Consumer wishes to use a service, it gets relevant information in the Service Registry that is used to know how to communicate with the Service Provider. It is one of most important issues to providing good quality of services efficiently for service consumers with service composition. Many approaches are proposed such as using different selection algorithms [5][6] . However, these approaches show to be inefficient when the status of selected services have changed, which might occur frequently in a dynamic environment like the Web. For example, if a selected service become unavailable or an incorrect estimated QoS is provided, this will cause each approach to stop the service execution, identify the workflow slice still to be executed and perform re-binding on that portion rather than re-binding only the stopped service [7]. In addition, since these approaches consider the set of the selected services as the one solution, a single service will not be exchanged even when a better service has been discovered from the registry. The rest of paper is organized as follows: Section 2 describes the importance of this survey; Section 3 describes QoS criteria for a service; Section 4 describes
the related works on service selection; Section 5 describes the service selection based on binary tree algorithm and Section 6 tells about the future opportunities of service selection in SOA. IMPORTANCE OF THIS SURVEY QoS is a set of non-functional parameters of a service, such as service time, service price, success rate, reliability, and security. However, with the wide deployment of web services on the Internet, many service providers register a large number of service components with similar or identical functions, which have different levels of quality of service (QoS). With the increasing popularity of the development of service oriented applications, measuring the quality of services becomes an imperative concern for service consumers and providers. The level of QoS has great influence on degree of the service usability and utility, both of which influence the popularity of the service (Mani& Nagarajan, 2002). A web service with superior QoS can bring high competitive advantage to web service providers and also carry social welfare to consumers. Currently, in many cases, such as flight booking and digital music download, there exist a number of available services providing similar or identical functional characteristics, but they exhibit divergent quality of services (QoS). This multiplicity might lead to a complex problem of service selection. How to differentiate their quality based on QoS criteria at web service level (WSL) is a challenging issue. QoS of web services compose both functional and non-functional properties; functional properties can be measured in terms of throughput, latency, response time; nonfunctional properties address various issues including integrity, reliability, availability and security of web services (Zhou, Chia, & Lee, 2005).[1] QOS CRITERIA FOR A SERVICE QoS criteria for different domains may be different. To be more generic and precise, we consider 6 criteria: performance, cost, reliability, availability, reputation and fidelity. Performance: The performance is the time duration (turn round time) from a request being sent, to when the results are received.
Cost: It refers to the amount of money that consumer pays for using a service.
the
Reliability: The reliability is the probability that the requested service is working without a failure within a specified time frame [10]. Availability: The availability is the quality aspect of whether the service is present or ready for immediate use [10]. Reputation: The reputation is the criterion in measuring total trustworthiness of a service. Fidelity: The fidelity is the average marks that are given by different consumers to the same QoS criterion. RELATED WORKS ON SERVICE SELECTION Many research papers proposed QoS-Aware service compositions and introduced the approaches such Genetic Algorithm (GA) approach, Pisinger Algorithm (PA) approach and Integer Programming (IP) approach to help select the optimal services. These approaches are briefly described below. G. Canfora et. al. [5] proposed a service composition approach based on a genetic algorithm (GA). The genetic algorithm is used for finding a solution of the service composition while considering global constraints. Once the GA finds services that are requested in a service composition flow, a fitness function is applied. The fitness function computes the QoS value of the selected services and compares the results of the fitness function with the global constraints. If the QoS is satisfied and the stop criteria of the GA are met, the service composition stops with the found solution. However, if a problem occurs while executing the found solution such as the selected service not being available, or the estimated QoS differing from the actual QoS, the recomposition procedure will be performed for the unexecuted path. L. Zeng et. al. [6] proposed a service composition approach by using the Integer Programming (IP) problem. The IP is used to select an optimal execution plan without generating all possible execution plans. The problem of selecting an execution plan is mapped into an IP problem, allowing the IP to find the appropriate execution plan with concrete services. This approach requires replanning when the QoS is updated, or if a service fails during the execution
of services because the approach computes overall score for the execution plan. This implies that if one service is replaced, the overall score could possibly cause problems on global constraints. In a dynamic environment like the Web, variations in service qualities such as reliability and availability are inevitable because of several factors. For instance, a loss of the Internet connection for service providers might be one of the factors. Thus, the service composition approach should consider these circumstances. However, this paper already identified that the related works do not much consider these issues. First, the above-mentioned approaches perform service re-composition for the remaining unexecuted services when the changes on the status of services are made. For example, the selected service becomes unavailable or the estimated QoS values are different from the actual QoS values. This happens because the above mentioned approaches cannot simply target the failed service because they are designed to consider the global QoS values. In addition, the existing approaches cannot react to a new service efficiently because when they face a better quality service compared to the existing one, all of the unexecuted services are re- selected to ensure that the set of selected services does not deviate from the global QOS constraints. SERVICE SELECTION BASED ON BINARY TREE ALGORITHM This section describes a service selection scenario using a tree-based algorithmic approach. In this scenario, a new service consumer and a service registry that are different from the normal consumer and registry are introduced. The new service registry contains another repository that keeps the consumer information about which services are being used by which consumers. This information is maintained for two reasons: (1) to retrieve the consumers service trees and enable other service consumers to share them; and (2) to keep information consistent between service consumers and service registry. The new service consumer contains service trees in its cache in case of service failure, changes due to the changes in QoS values or the discovery of new service. Therefore, the service consumer reselects the optimal service node to use the service consistently and dynamically.[4]
Fig 2: Service selection scenario With the new service consumer and service registry, a service selection can be performed according to the following process: 1) Service providers register their own services to a service registry. Each registered service in the service registry has information about the capability, QoS data of service, interfaces, and supported communication protocols. 2) The service consumer requests services based on the functionalities and the QoS information. 3) The service registry searches for consumers who use the requested services in a repository. 3-1) if there is a match; the service registry copies the service trees from the consumer and sends them to the requester. 3-2) if there is no match, the service registry discovers the requested services and sends the set of discovered services to the requester. 4) After the requester receives service trees or constructs service trees with received the set of services, the right most service node in the tree is selected as the optimal node. This is because the BST always places the greatest value on that side of the tree.
5) After the services are executed, the service consumer provides a feedback on the service usage such as QoS information. 6) Next, the service registry identifies service consumers who use the service that has been updated recently. Once they are identified, updated information is sent to them. 7) Finally, service consumers who receive changes from the service registry update their service trees and reselect the optimal services if necessary. As described in this scenario, the service consumer contains service trees, which are generated according to the score of services. The score can be measured from the normalized QoS values with weights that are decided by the consumers. The structure of the service tree is similar to a binary search tree (BST). The structure is used because it provides an efficient method of data storage and organization [8][9]. FUTURE OPPORTUNITIES Ping Wang in his paper presents a fuzzy decision model to solve the selection of QoS-aware web services provisioning. His model has the following features. 1. His proposed approach not only deals with the decision makers imprecise perceptions under incomplete information, but also objectively determines the importance weights of QoS criteria. The weightings are based on group preferences for a group of participants and realistically attain a QoS-based ranking of a list of web services. 2. His approach enables decision makers to select qoS aware services from the marketplace. In the decisionmaking applications, his approach is a complement way to the works of De et al. (2001) and Szmidt and Kacprzyk (1996). Future work will focus on the comparison of fuzzy reasoning using different decision rules such as maxmin, maximax and minimax in order to recognize results for different decision makers attitudes such as pessimistic, moderate and optimistic [1]. Min liu proposed a QoS vector model and a QoS evaluation method of service composition process are proposed, a flexible constraint satisfaction framework is developed to model the QoS-aware service selection problem with utility function as its objective function, and a branch and bound-based heuristic algorithm called BB4EPS is proposed and implemented to solve
the QoS-aware service selection problem. Theoretical analysis and experimental results show that the BB4EPS algorithm can find an optimal solution using heuristic information in the large-scale service composition domain. Their future work focuses on three directions: (1) analyzing the constraint parameters to improve the convergence velocity of the composition algorithm; (2) optimizing the BB4EPS algorithm by determining the best number of quality levels at run time based on trusted QoS information; (3) applying the BB4EPS algorithm to enterprise business process collaboration.
[2]
Lei Yang Yu Dai Bin Zhang proposes an approach of the trustworthiness QoS driven service selection in the context of the environment. The proposed QoS model considers the characteristics of the environment the service operating, which can give personalized estimated QoS and make the evaluation adaptable in dynamically environmental context. Then, the paper presents how to put the proposed model into the problem of service selection and the experiments shows the better performance of the approach. In the future work they focus on, (1) optimize the proposed QoS estimation method by the use of the machine learning and data mining, to make the estimated QoS much closer to the actual one; (2) extend the proposed QoS model, especially considers more environment factors to support the real application. [3] CONCLUSION The QoS-aware service selection for service composition is an active research area in Service-Oriented Architecture (SOA). The increasing popularity of employing web services for distributed systems contributes to the significance of service discovery. However, duplicated and similar functional features existing among services require service consumers to include additional aspects to evaluate the services. Generally, the service consumers would have different view on the quality of service (QoS) of service attributes. How to select the best service in theory among available service (WS) candidates for consumers is an interesting practical issue. This paper briefly discusses about a QoS-aware service selection models and its importance and future works. Main objective is to identify their dissimilarity on service alternatives, assist service consumers in selecting most suitable
services with consideration of their expectations and preferences. REFERENCES [1] Ping Wang QoS-aware web services selection with intuitionistic fuzzy set under consumers vague perception, www.elsevier.com/locate/eswa, 2009. [2] Min Liu A quality of service (QoS)-aware execution plan selection approach for a service composition process, www.elsevier.com/locate/fgcs.,2011. [3] Lei Yang Yu Dai Bin Zhang, Trustworthiness QoS Driven Service Selection in the Context of Environment, Ninth International Symposium on Distributed Computing and Applications to Business, Engineering and Science,2010. [4] Minhyuk Oh, An Efficient Approach for QoSAware Service Selection Based on A Tree-Based Algorithm, Seventh IEEE/ACIS International Conference on Computer and Information Science, 2008. [5] Canfora, G., Di Penta, M., Esposito, R., and Villani, M. L., An Approach for QoS-aware Service Composition based on Genetic Algorithms, Proc. of the 2005 Conf. on Genetic and evolutionary computation, ACM Press, New York, 2005. [6] Zeng, L., Boualem B., Anne H.H. Ngu, Marlon D., Jayant K., and Chang, H., QoS-Aware Middleware for Web Services Composition, IEEE Transactions on Software Engineering, 2004, Vol. 30, pp. 311-327. [7] Canfora, G., Di Penta, M., Esposito, R. and Villani, M. L., QoS-Aware Replanning of Composite Web Services, ICWS05, 2005. [8] Haq, E., Cheng, Y.; Iyengar, S.S., New algorithms for balancing binary search trees, IEEE Conf. Proc., pp. 378-382. [9] J. Nievergelt, Binary Search Trees and File Organization, CSUR, ACM Press, September 1974, Vol. 6, pp. 195-207. [10] Kalepu, S., Krishnaswamy, S. and Loke, S. W., Verity: A QoS Metric for Selecting Web Services and Providers, WISEW03, 2004, pp. 131139.
Framework for Data Management in Mobile Location Based Services

R.Gobi, Dr.E.Kirubakaran, Dr.E.George Dharma Prakash Raj Research Scholar 1, Additional General Manager 2, Assistant Professor3 School of Computer Science and Engineering 1, 3, Outsourcing Department 2 Bharathidasan University- Trichy-231,3 Abstract Recent research in mobile computing has shown that Location Based Services (LBS) is the next big thing in mobile. It refers that services provided to the user based on location. LBS become the most promising services of mobile computing besides voice and data services. It's growing in popularity due to the ubiquity of smart phone users. Location-based data is real-time and highly dynamic; which deals about location management as well as data management concurrently. Because of this dynamic nature traditional information management techniques are not well suited for data management in LBS. To answer this challenge, we first analyzed the dynamic nature of the problems with respect to data management and based on the analysis we provided a framework that will flexible enough to provide dynamic access in LBS. Our framework includes set of dynamic data management techniques that are caching, pushing and replication to improve service response under changing user mobility and dynamic access patterns. The analysis of this framework leads for future research by providing significant optimism for future growth in mobile business models. experience of this new product domain. User location is an important dimension in this new data-service world: Not only does it allow companies to conceive completely new service concepts, but it also has the potential to make many messaging and mobile Internet services more relevant to customers as information is adjusted to context. In addition, location information can considerably improve service usability. As a result of these multidimensional benefits of location information, operators are coming to consider it as their third asset besides voice and data transmission. Important investments are being made to extract, use, and market it. The rest of the paper is organized as follows. Section 2 provides a survey of related issues and research work. Section 3 presents our framework for dynamic data management problem. In Section 4, we proposed system architecture for LBS to answer the challenge. Section 5 deals facilitate the design of a set of dynamic data management strategies as well as the analysis of the system behavior. Section 7 concludes the paper.
1. INTRODUCTION The advances in portable devices and wireless communication technologies enables a new form of services named location based services which deliver location dependent and context sensitive information to mobile users. Typical examples of such services include area maps, local weather, traffic condition, tour guide, and shopping information, etc. With the proliferation of mobile computing technologies, location based services have been identified as one of the most promising target application. Location based services can be defined as services that integrate a mobile devices location or position with other information so as to provide added value to a user. Approximately 30% of current mobile network operator income is already based on data services. To grow the data business further, operators need to invest in new technologies, especially in mobile messaging and mobile Internet, and look for ways to optimize the user
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) Location-aware computing systems respond to a users location, either spontaneously or when activated by a user request. 3. DYNAMIC NATURE In mobile environments, the user population changes dynamically and the access patterns can shift rapidly as well. The major challenge of mobile caching is therefore how to cope with mobility and dynamism. Semantic caching techniques employed semantic descriptions of cached items to facilitate better cache admission and replacement decisions that are responsive to the user movement. Mobility is one of essential characteristics. This change of position will influence and even trigger some events. If a user makes the time, route, velocity and constraints available to the system, the services can intelligently compute, predict and decide for the user considering all relevant information (e.g. weather, traffic etc.). The issues related to dynamic data management in mobile environments have also been discussed in the context of replicated database systems. Research efforts on moving objects database are also related to our work in the need to process data in a highly dynamic environment. The challenge is to design flexible service architecture and dynamic data management strategies to provide highly responsive and location dependent information access in resource constrained and rapidly changing environments for the application domains discussed above. We call it the dynamic data management problem for location based services in mobile environments. The dynamic nature of the problem is strongly emphasized in here since both the clients and targets may change locations at any time. The dynamic nature of our approach facilitates high adaptivity and timely responses to rapid changes in mobile environments.
2. RELATED WORK A key characteristic of location based services is that the same service request may need to be answered with completely different results as the user changes his/her location or the targets move. Because of the highly dynamic nature of the problem, traditional information management techniques are not well suited for location based services. Developing proper infrastructure, location management, as well as data management strategies for such services has been a major challenge to both wireless service providers and application developers. Data management in mobile computing environments is especially challenging for the need to process information on the move, to cope with resource limitation, and to deal with heterogeneity. Among the applications of mobile data management, LBS have been identified as one of the most promising area of research and development. The main components of the underlying database of such applications include stationary and moving objects. The so-called Moving Objects Databases (MODs) are ubiquitous. As the number of mobile commerce or, in general, mobile services, increases rapidly every day, the need for robust management systems about location data, as well as the analysis of user movements are vital. Many of the previous work on LBS treated location as an additional attribute of the data tables. In this way, location based service queries can be processed like ordinary queries except with additional constraints on the location attribute. The providers of these products and services in the value chain want to succeed by focusing on their strengths and by building one and selling it many times rather than building everything differently for every customer. The most critical aspects of context are location and identity.
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) 1. FRAMEWORK received from the central server as well as other local servers. A local server is also equipped with a map database for answering location dependent map queries. The client is any end user device that is capable of wireless communication as well as user interface services. This is probably the branch of the wireless technologies that evolves with the fastest pace. Therefore we do not presume the computing power and storage capacity of a client device. We only assume that it can send out information requests via a wireless link, can do some local processing if required, and has a client cache for keeping frequently accessed data items. The most distinctive feature of a client is that it moves. A client can change its position at will, in and out of a cell, from one cell to another, without the obligation to notify any server in advance. Since a client can issue a query at any time anywhere, this is especially challenging for information service providers. In our architectural design, a client always sends requests to the local server of the cell where the client resides. The target objects may be available right at the client cache, at the local server of the same cell, from the local servers of other cells, or from central servers. 1. ANALYSIS Caching - On Demand: For comparison purpose, we describe the basic on-demand caching strategy in the location based service environments and analyze its access cost. The strategy is to use three levels of storages (i.e. client, local server, and central server) in a way similar to the memory system in a modern computer. More specifically, whenever a client request is issued, the client cache is first examined to see if a hit can be found. If a client cached copy of the requested object is found, the request is immediately satisfied without further communication. Otherwise, the request is forwarded to the local server through the wireless link. The local server then checks the local server cache to see if the requested object
The central server is used to model a service site for centralized data such as the New York Stock Exchange. In addition to a central information database, a push unit is included to facilitate server-initiated pushing strategies that proactively send selected data items toward the clients. Since the downlink bandwidth is usually much larger and cheaper than uplink connection, pushing techniques turn out to be efficient and valuable tools with little extra cost. The local server is the data manager and wireless information server for a single cell. Each cell is assumed to have a unique local server which provides wireless access for all the clients in its cell and acts as a bridge between the central server and the client devices at the same time. It is the center for managing local information as well as the key player to provide location based services. All local servers are connected to the Internet via fixed network and therefore can send information to each others with almost negligible delay in comparison with wireless access. This is an important factor since neighboring local servers must work closely together to provide efficient location-based services. The cache and data manager is responsible for maintaining the information
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) can be found there. A local server cache hit results in the transmission of the desired object to the client through the wireless link. Otherwise, the request must be forwarded again to the central server that owns the object. The central server can then sent the object back through the local server to the client. Judicious caching As discussed in Section 4, both local servers and clients have caches to retain downloaded information. If certain types of objects published by a particular central server were accessed frequently by the clients of a cell, a simple idea is to maintain cache copies of all such objects at the local server and keep them always up to date. In this way, the clients in the cell can always access fresh copies of the objects directly from the local server without further delay. We termed this judicious caching since the target objects to be cached can range from the entire central server to certain categories or to a particular class of objects. However, any caching strategies bring about maintenance cost as well. To keep the cached copies up to date, a local server must spend the extra cost of getting new copies from the central server whenever updates occur. Proactive pushing The judicious caching strategy is basically a pull-based approach. A natural alternative is a push-based approach in which a certain percentage of hot data (i.e. frequently accessed data) is pushed toward the local server in an attempt to minimize the access cost and response time experienced by the clients. We call this proactive pushing since it is the central server that maintains the necessary statistics and proactively pushes selected objects to the local server. By pushing hot data, we also pay the maintenance cost of keeping the pushed data up to date. Replication: Both of the judicious caching and proactive pushing strategies are mainly targeting location independent data objects mostly from remote central servers. For location based range queries, the target objects are no longer from central servers but from the local servers of the current and neighboring cells within the specified range. Data management strategies must be tailored accordingly. In general, different range queries may overlap with each others. If similar range queries or those that are largely overlap were repeatedly issued by the clients in a cell, then it would be beneficial to replicate the data objects from the most frequently accessed neighboring cells to the local server. We name this idea neighborhood replication for obvious reason. The set of replicated cells is called the replication range of the strategy. The main issues are then the determination of replication condition, the selection of proper replication range, and the dynamic range adjustment strategies.
4. CONCLUSION We have proposed a service framework, system architecture, simple but effective cost models and dynamic data management strategies for location based services in mobile computing environments. The most distinctive features of the proposed strategies are their capability to dynamically respond to changes in the mobility and/or access patterns. Therefore dynamic data management is essential for location based services. Even for location independent data, dynamic strategies
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) are still required since the clients can move from one place to another. By characterizing the data sources and access trends, we have designed effective strategies that successfully balanced between the saving in access cost and the increase in maintenance cost. We plan to further extend our system framework and cost model to handle information services on moving data sources, not just moving clients. We are also evaluating the potential of employing mobile agent technologies to support continuous location based service queries. REFERENCES [1] Spiekermann, S., "General Aspects of Location Based Services", in: Location Based Services, Hrsg. Agnes Voisard und Jochen Schiller, Morgan Kaufmann, San Francisco, 2004 [2] Bin Xu, Alvin Chin, Hao Wang, Hao Wang and Li Zhang, "Social Linking and Physical Proximity in a Mobile Location-based Service", Ubicomp 2011. Published in ACM Digital Library, ISBN: 978-1-4503-0928-8 [3] Sai Teja Peddinti, Avis Dsouza, and Nitesh Saxena , "Cover Locations: Availing Location-Based Services Without Revealing the Location" ,2011. Published in ACM Digital Library, ISBN: 978-1-4503-1002-4 [4] Shiow-Yang Wu, Kun-Ta Wu, "Effective Location Based Services with Dynamic Data Management in Mobile Environments", Springer Science 2006, Wireless Networks 12, 369381, [5] Nikos Pelekis, Yannis Theodoridis, Spyros Vosinakis and Themis Panayiotopoulos, "Hermes - A Framework for Location-Based Data Management", ACM Digital Library 2006 [6] Hans-Arno Jacobsen, "Middleware for Location-Based Services", University of Toronto [7] Dragan Stojanovic, Slobodanka Djordjevic-Kajan, "Internet GIS Application Framework for Location-Based Services Development", [8] Jeffrey Hightower, Gaetano Borriello, "Location Systems for Ubiquitous Computing", University of Washington, August 2001 [9] Subhankar Dhar and Upkar Varshney, "Challenges and Business Models for Mobile Location-based Services and Advertising", Contributed Articles, Communications of the ACM | May 2011 | vol. 54 | no. 5 [10] Ramaprasad Unni and Robert Harmon, "Location-Based Services: Models for Strategy Development in M-Commerce", IEEE Explorer, Conferences, 2003. [11] Artem Katasonov and Markku Sakkinen"Content Quality in Location-based Services: A Case Study" IEEE Explorer, Conferences, July 2005. [12] David Tilson, Kalle Lyytinen and Ryan Baxter, "A Framework for selecting a Location Based Service (LBS) Strategy and Service Portfolio", Proceedings of the 37th Hawaii International Conference on System Sciences - IEEE 2004.
Incorporating Pre Audit Method in Multilevel Secure Composite Web Services

1
ABSTRACT
Venkatesh.P1 ,S.Rajesh M.Tech,(MBA)2,D.Jagadiswary, M.Tech3 UG Final year CSE, 2,3 Assistant professor CSE, Dr.Pauls Engineering College.
E-commerce, e-banking, and stock markets are important domain and have more competite in their market and provide the better services to the customer with secure and efficient manner. Web services play a vital role in order to achieve the better services to the service consumers. Can access the information through internet. Application based on the Service-Oriented Architecture (SOA) consists of an assembly of services, which is referred to as a composite service which uses by Banks, Military and Defense systems. A composite service can be implemented from other composite services, and hence, the application could have a recursive structure. Incorporating a security policy for a composite service is not easy because the policy should be consistent with the policies of the external services invoked in the composite process. Web services uses XACML language for providing the security to the web services, but it still suffers from some limitations when it comes to its capabilities in supporting the requirements of open Web-based systems. But our approach incorporate the access control modularity(Pre auditing method) for each user which defined in XACML and lead to EEXACML specification. composite process expressed in BPEL and a corresponding diagram generated by a tool such as Eclipse for BPEL [6] or Web Sphere Integration Developer [7].
Introduction: web services technologies change the software industry drastically by developing and integrating enterprise web services and application in order to enable the user to access them.web services combination is the merging of web serviecs from differences service providers to create a ,more sophisticated value added web services to the user users .Researchers put on efforts to bring out the state of the art in web servies technologies and security technologies for web services .Securing an SOA application is an important requirement for all the domains besides the functional requirement. POLICY OF COMPOSITE SERVICES Composite Service Definition This paper starts with a simplified composite service application to explain our approach rather than the WS-I SCM application. The travel reservation service shown in Fig. 1 is a composite service that consists of a process invoking the airline reservation service and a hotel reservation service, which are external services. These external services are invoked in parallel and symmetrically in the process of making the reservation for a trip. The external services may also be composite services that invoke other external services. A composite service process that invokes external services can be represented using BPEL [1]. Listing 1 in Fig. 2 shows a simple
AAccess Control Policy The Access Control Policy restricts who can access a service.For example, only travel agency employees should invokethe travel reservation service. This requirement is for a service operation itself. Therefore, an ACP is a set of an operation name and a list of roles that are allowed to access that operation. The ACP can be defined as follows: ACP :(operation name, role list). The ACP for the operation getReservation of the travel reservation service can be defined as (getReservation, [agencyEmp]), where agencyEmp is a role for a travel agency employee.There are several XML formats for ACP representation, such as XACML [5] and WS-Policy. XACML is a typical specification
for an access control policy, or we could use WSPolicy for ACP. However both specify only a framework for policy representation, and hence we need to add some extensions to express the ACP for a service itself. There is no standardized expression of ACP for Web Services, while WS Security Policy is the standard for MPP policy, so the representation of the ACP is not discussed here. In this study, defining how to express of the ACP is not our focus. MPP Transformation MPP predicates are transformed from the WSSecurity Policy representation. The WS-Security Policy specification defines many security policy assertions to specify security requirements. However, by considering the security requirements specified by WS-Security, the assertions of WS-SecurityPolicy can be classified into the three types: for integrity, for confidentiality, and for optional requirements. We define the MPP predicates that correspond to these types. The predicate for an MPP is mpp, which is applied to a message m and has ID lists for the three security requirements and tokens. The integrity requirement is specified by a signature predicate, where sigId is the ID of this signature on a variable in a variable list var, and the tokenId is the ID of a security token used for this signature, using the canonicalization algorithm calgo, the signature algorithm salgo, the transform algorithm talgo, and the digest algorithm dalgo. The last variable protectToken of the predicate signature specifies a Boolean value that flags a security token to be self-signed if the token is used for message signature. This property is specified by the ProtectTokens assertion, and is a unique property for signatures.
BPEL Transformation An element in BPEL is also transformed into a predicate whose name is the same as the element name. A sequence element of the BPEL process is transformed into a sequence predicate. The sequence predicate consists of flows and actions. A flow predicate specifies a linkage of BPEL actions. We define some actions as predicates. The receive, invoke, reply, and assign predicates are transformed from the corresponding actions. A variable assignment in the composite process is specified by an assign predicate, where the specified variable assignment from the fromvar variable of the from message is to the tovar variable of the to message.The variables in these predicates correspond to the attributes of an XML element for each action. ACP Composition Rule The ACP can be regarded as an data property as same as MPP, so the composite ACP will be valid when data for a composite operation and data for an atomic operation are identical and their properties are consistent. Listing 4 in Fig. 7 shows the predicates for ACP composition rules. The predicate for ACP is quite similar as the MPPs predicate because both rules are based on the same basic idea. The request.ACP In consistents predicate is a constraint for consistency of roles allowed to access an operation at request side. This predicate returns false if a request-side security policy named cPolName for
composite service has no inconsistent roles. And true is returned if a validated policy violates ACP consistency rules by a variable in the composite service cVar, and provides information for an invalid requirement eCode and solution hints sol to resolve the inconsistency, as same as the MPP predicate. The request Role Inconsistent predicate returns false if there is no invalid roles in a security policy cPolName for a composite operation cOpe. The predicate request Role signifies that the atomic service operation aOpe is allowed for the roles specified in aRoles. The composite and atomic ACPs are consistent if the roles allowed by aOpe are included in the roles allowed by cOpe. The predicate inconsistent Role returns true when all of the roles in aRoles are included in the allowed roles cRoles. The predicate request ACP Inconsistents returns false if the composite and atomic ACPs are consistent. When true is returned, we can redefine the consistent ACP by referring a counterexample. We define the constraints on policy consistency as rules to validate the composite MPP and ACP. return false when a composite policy satisfies the valid composition rule, hence we can pinpoint inconsistent portions of composite policy when the rules are violated and return true. Thanks to the way inference works, our engine can work for both the top-down and bottomup policy composition approaches. The consistent composite policies are inferred using the bottom-up approach, and the composite policies can be verified using the top-down policy definition approach. Also, we can compose policies using the same predicates even if external services are also composite services, not atomic service. Solutions for Policy Inconsistencies Our policy composition engine finds where policy inconsistencies exist according to the declarative rules defined in Section 4.3. However, the inference engine cannot get how to fix the inconsistencies. Not only finding inconsistency problems but also providing solutions are quite important practically. Actually, multiple solutions would be possible for one inconsistency problem, therefore, we list invalid cases that cause policy inconsistencies and provide typical solutions for each invalid case. We clarify the definition of policy consistency used in this paper. If all consistent requirements to an atomic policy are included in a composite policy, the composite policy is consistent to the atomic policy. Fig. 8 illustrates
relevance of atomic and composite policies. Our definition of composite policy consistency is shown by Fig. 8a. Hence, we should fulfill the relationship shown in Fig. 8a by changing an atomic and a composite policies if we find inconsistent requirements by inconsistency inference. We have two approaches to fulfill the policy consistency: changing a composite policy, or changing an atomic policy. Here, this paper takes the first approach, i.e., changing a composite policy, because the objective of this study is creating a consistent composite policy from atomic policies applied to atomic services that consists of a composite service. Therefore, we assume that atomic policies are predetermined and hard to revise, hence we decide to take an approach of chaining a composite policy in this work. There are three types of reasons for inconsistencies we assumed, as shown in Fig. 8: 1) a composite policy lacks some requirements included in an atomic policy (Fig. 8b), 2) an atomic policy has larger number of requirements than those of composite policy, but a composite policy has also additional requirements which are not included in the atomic policy (Fig. 8c), and 3) a composite policy has larger number of requirements than those of atomic policy, and also a atomic policy has also additional requirements which are not included in the composite policy (Fig. 8d). Our composition engine provides solution hints and suggests how to fix inconsistencies of atomic and composite policies, it means that the invalid relations in Figs. 8b-d change to the valid relation in Fig. 8a by provided solutions, for example, adding some missing requirements.The following sections discuss solutions to create a consistent composite policy.
Solutions for MPP Inconsistent The request MPP Inconsistent predicate becomes true when one of the three following predicates is true, request Integrity Inconsistent, Request Confidentiality Inconsistent, and request OptReq Inconsistent predicates. Here we discuss the solutions for the integrity inconsistencies only because the space is limited. The request Integrity Inconsistent predicate has three possible invalid cases: 1) Algorithm inconsistent, 2) Token inconsistent, and 3) Protect Tokens inconsistent. Solutions for these inconsistencies are discussed in the following.Here we represent these solutions logically.Algorithm inconsistent. If an algorithm for atomic variables and an algorithm for composite variables are inconsistent, an algorithm for composite variables needs to be changed to another consistent algorithm with an atomic algorithm. If the following inconsistent predicates returns true, two algorithms specified by variables are inconsistent, hence an algorithm for a composite service needs to be changed to the consistent one of an atomic service. The change CompAlgo predicate means that an algorithm cCalgo for a composite service is changed to a new one nCalgo. This is consistent with an algorithm aCalgo of an atomic service, which represents the consistent predicate. A signature requires four algorithms, a canonicalization algorithm, a signature algorithm, a transformation algorithm, and a digest algorithm. The following inconsistent predicates correspond to the canonicalization algorithms. The other three predicates for other algorithms are omitted due to the space limitation.. Token inconsistent. If signing security token types are inconsistent, a new signature whose signing token type is aTtype needs to be added to a composite policy. There are two cases of token inconsistent as follows.The inconsistent predicate returns true when a token type for atomic service aTtype and for composite service cTtype is inconsistent. And the exist predicate returns true if a composite policy cPolName specifies a security token which type is aTtype. When both predicates are true, a signature that uses a token whose type is aTtype needs to be added, where the addSignature predicate represents this rule. In the second case, the inconsistent predicate returns true,
but the exist predicate is false. Therefore, a new security token whose type is aType needs to be added, as represented by the addToken predicate. And then a new signature which uses a new aType token is also added. inconsistent(aTtype:TokenType, cTtype:TokenType), exist(cPolName:String, aTtype:TokenType) ! addSignature(cPolName:String, aTtype:TokenType) inconsistent(aTtype:TokenType, cTtype:TokenType), not(exist(cPolName:String, aTtype:TokenType)) ! addToken(cPolName:String, aTtype:TokenType), addSignature(cPolName:String, aTtype:TokenType) Top-Down Policy Composition In the top-down approach, security policies for a composite service is also necessary to input the policy composition engine. The following is representations for security requirements of a composite service, i.e., Travel Reservation Service. _ MPP for TR signature(agencyMpp, agp:sigID1, [soapBODY], agp:x509ID, exc14n, hmacsha1, exc14n, sha1, true). encryption(agencyMpp, agp:encID1, [soapBODY], agp:x509ID, kwrsa15, aes256). token(agencyMpp, agp:x509ID, x509V3). supportingToken(agencyMpp, agp:unID, username). _ ACP for TR available(agencyAcp, agp:getReservation). acp(agencyAcp, [agent, airlineEmployee]). These predicates represent security requirements on a request message of the travel reservation service. This security policy requires both signature and encryption on a SOAP Body. Here, the soap BODY keyword means that a SOAP Body of the message should be covered by signature or encryption. Both signature and encryption require an X509v3 security token. Additionally, another security token is
required: a username token as a supporting security token. Bottom-Up Policy Composition In the case of bottom-up approach, atomic services have attached security policies but no security policy for composite service are defined. Therefore, we apply the template of composite policy in the logic representation with variables as shown below instead of the policy for travel reservation service in Section _ MPP template signature(agencyMpp, SigID, SignedPartsList, SigTokenID, C14NM, SigM, TransM, DigM, PToken). encryption(agencyMpp, EncID, EncryptedPartsList, EncTokenID, KEncM, DEncM). token(agencyMpp, SigTokenID, STType). token(agencyMpp, EncTokenID, ETType). supportingToken(agencyMpp, AddTokenID, ATType). _ ACP template available(agencyAcp, Operation). acp(agencyAcp, RoleList). role(Ro leName). To infer values for variables in the policy templates, we need to execute the requestDPPInconsistent and requestACPInconsistent with a composite policy name agencyMpp. Multiple values which satisfy the policy consistency rules are obtained. The following is one example of results for the DPP. It means that the composite policy should have asignature by the exc14n algorithm for canonicalization. CompOpe = agp:getReservation, CompVar = agp:hotelInfo, AtomOpe = hpi:reserveRoom, AtomVar = hpi:hotelInfo, TCODE = 1, ECODE = 1, Sol = [C14NM, exc14n] We need to create the concrete composite policy which includes all of the inferred values. We propose the supporting method for policy generation as discussed in Section PERFORMANCE EVALUATION We apply our policy composition to the WS -I SCM application explained in Section 2. The Retailer service can be regarded as a composite service that
invokes the Warehouse A, B, C services. The Warehouse A, B, C services also invoke the Manufacturer and Warehouse Callback services, and they are also regarded as composite services. The WS-I defines the MPPs for each service in the SCM application, so we evaluate our approach by applying to the application in the top-down way. We examined the execution time of the requestMPPInconsistencies predicate that infers all inconsistent requirements, number of logical inferences, and the average number of lips (logical inferences per second). The environment of this experiment is as follows: the operating system is Windows XP -Professional with Service Pack 2 running on an Intel Core2 2.00 GHz with 3 GB of memory. We found six inconsistencies for the SCM application, and the results of experiment are . Execution time: 50 [milli seconds]. Number of inferences: 116,718. Average number of lips: 2,489,984 [Lips].
Comparing the policy composition in our approach, we tried to find the inconsistencies of the SCM application according our consistency rule manually. We needed to check multiple input files in XML, i.e., 22 files related to WSDL, 24 files for BPEL, 6 files for security requirements, and it was quite hard only to find the relations by assign actions. We can find the assign action in BPEL soon, however we need to check the corresponding WSDL and security requirements to extract data properties. Actually, we needed more than one hour to find the data properties from XML representations, and also it took more times to check consistencies of the data properties, about a half hour additionally. Above the experimental results, our policy composition framework shows significant improvement in terms of reduction of time and costs, so it would be great help to finding the policy inconsistencies.
[2] C. Tziviskou and E.D. Nitto, Logic-Based Management of Security in Web Services, Proc. IEEE Intl Conf. Service Computing(SCC 07), pp. 228-235, 2007. [3] A.J. Lee, J.P. Boyer, L.E. Olson, and C.A. Gunter, Defeasible Security Policy Composition for Web Services, Proc. Fourth ACM Workshop Formal Methods in Security (FMSE 06), pp. 45-54, 2006. [4] Web Services Interoperability Organization (WSI), http:// www.ws-i.org, 2011. [5] WS-I, Supply Chain Management, http://www.ws-i.org/ deliverables, 2011. [6] Eclipse BPEL http://www.eclipse.org/bpel, 2011. Project,
CONCLUSION This Proposed system will validate the user access control for enforce the goals of information Security.XACML will be robust when incorporate the pre audit method in XACML specifications.It reduces the false alarm rate and will be efficient in order to avoid application layered attacks. The advantage of our approach is that the policy consistency rule does not depend on any specific processes. And two composition approaches can be supported; top-down and bottom-up policy composition, providing solution hints to resolve policy inconsistencies. We also discussed how to create a consistent policy semi-automatically from solution hints. REFERENCES [1] Web Services Business Process Execution Language Version 2.0, http://docs.oasisopen.org/wsbpel/2.0/OS/wsbpelv2.0-OS.html, 2011.
[7] WebSphere Integration Developer, http://www.ibm.com/ software/integration/wid, 2011. [8] Web Services Security: SOAP Message Security1.1,http://www.oasisopen.org/committees/do wnload.php/16790/ wss-v1.1-spec-os SOAPMessageSecurity.pdf, 2011. [9]WS-SecurityPolicy1.2, http://www.oasisopen.org/committees/download.php/23821/wssecuritypolicy-1.2-spec-cs.pdf, 2011.
WEB INTRUSION AND ANOMALY DETECTION BASED ON DATA CLUSTERING AND ADAM R.Sylviya1, K.Indumathi2, M.Vanitha3, A.Meiappane4 Sri Manakula Vinayagar Engineering College, Madagadipet, Puducherry 605 107. Abstract The internet services has increased so drastically now a days that it has become almost impossible to work without it . Web is one of the internet service used round the globe .As a result of which the web servers and web based applications have become the targets of the attacker. So the web based security cannot be ignored where there is drastic increase in web based economy. These protection of such servers has become mandatory. We have proposed two phases for detection of intrusion and anomalies from the HTTP request and defending them. The intrusions and the anomalies are found with the mechanism of clustering and the feature matrix . The first approach has two phases. The first phase has web layer log file matching and the second phase is based on value of packet arrival factor (af) of HTTP request followed by clustering. The defending of the HTTP attack is necessary because the spammers tend to flood the HTTP request at port 80 or port 443 and cause denial of service at web server. The anomaly detection is done by either small scale or artificially generated attacks data. The log files are taken into account and the security analysis is applied on it. This promotes careful, balanced and coordinated scrutiny of several aspects of data in a featured matrix. To assist this, an interactive and easy to use tool named ADAM is used. This is followed by the visualization of the intrusions and blocking of it.
Keywords: web intrusions, web layer log files, HTTP requests, data clustering, attack labeling, ADAM INTRODUCTION The servers that are accessed now a days at such a higher rate are unfortunately visited by the intruders and hackers to fulfill their illegal desire to exploit the confidentiality of websites and the databases. They try to attack the infrastructure of the internet inorder to disturb its functionality by taking the advantage of the internet services and protocols. The attacks that they make are denial of service attacks(DOS),SQL injection attack, e-mail spams and web services are the top threat that are to be recognized. These attacks cannot be underestimated as the web servers and the applications are to be preserved safely with high reliability, efficiency and confidentiality. In our proposed approach the intrusions and anomalies are detected using two phases. The first phase has web host based intrusion detection by matching the input features of web layer log file. The access logs and the error logs are taken into account from web server and they are matched with input features. The second phase has packet arrival factor. In the next approach suspected requests are automated and visualized using ADAM. This enables clear and easy study. Rest of the paper is organized as follows: In section 2, the related work and observations is given. In section 3, the background of our proposed scheme is given. In section 4, the proposed approach of ellipse fitting mechanism is given. In section 5, we have concluded with future work. Rest of paper is organized as follows. In section2, we have presented related work and observations. In section3, we have given background of our proposed scheme. In section 4, algorithm and the technique has been produced. In section 5, we concluded with future work. RELATED WORKS Once an attack is being detected, based on supervised clustering technique, the system administrator is informed and can take the corrective measure[7]. Authors developed anomaly-based system that learns the profiles of
the normal database access performed by webbased applications[3]using number of different models. In [1] a detection system correlates the server-side programs referenced by client queries with the parameters contained in these queries. The system analyzes HTTP requests and builds data model based on the attribute length of requests, attribute character distribution, structural inference and attribute order. In [5] logs of web server are analyzed to look for security violations. However the specific available information has been taken into account which is not portable. Based on our limited survey, following observations have been made: Detection of SQL injection attack and HTTP buffer overflow attacks are challenging issues Possibility of attack due to poor and careless web application coding by programmers
into sessions. Session characteristics(eq., page sequences) are compared against those of previous sessions intiated by the same IP, and anomaly score is computed based on assumption sessions would exhibit similar patterns. While probably true in static IP environment, such assumption may not hold in environments such as Web Proxy or Network Address Translation(NAT) is used. In addition, accurate identification of web sessions may prove difficult in some environments. PROPOSED APPROACH According to a study the most frequently attacked web site in the world is united states department of defense and the second most frequently attacked server is Microsoft s web server. In such a condition the security to prevent such attacks is mandatory and hence we have generated some web layer logs and these are examined by producing clusters for representative samples of anomalies logs. The clusters are matched with the standardized features of the anomalies and intrusion. It is then represented in the feature matrix and automated using ADAM (anomaly feature matrix). The attacks may be HTTP attacks and SQL injection. The HTTP attacks uses HTTP port 80 or HTTP communication port 443 to perform attacks. It is activated by spammers with flooding HTTP request at particular ports which cause denial of service at server. The SQL injection attack is conducted by spammer for unauthorized web service access breaking the authentication seal and violating integrity of the data storage. It mostly happens in a poor quality code written in PHP. We have proposed two techniques for such intrusion and anomaly detection. The first technique has two phases. The first phase is based on web layer log file matching and the second phase has value of packet arrival factor (af) of the requests that are clustered and labeled if they are normal or abnormal. The matching of the web layer log files is done by analyzing the characteristics of each and every request so as to produce the intrusion and anomalies in the log and blocking it. The analysis is automated using ADAM.
There have been several attempts to detect anomalous and suspicious activities in web requests. Snot [9],widely used open-source tool, has more than 1,300 signatures on known attacks stored in more than 50 rule sets. However, not all the rules remain effective in typical web server configurations. When applied to IIS web servers, for example, only about 11% of the rules are applicable. More importantly, unless web attacks are analyzed and attack patterns coded as rules, Snort is essentially useless in defecting servers from the threat of unknown attacks. Kruegel[2] investigated how anomaly detection technique can be applied on web logs. For example, Kruegel developed anomaly detection models based on features such as attribute length, character distribution, or absence of attribute variables and their sequences. While various features can be assigned different weights to optimize performance, it is difficult to determine the right parameter values. As operation heavily relies on heuriatics, simple computation (eg., average of anomaly scores) may not detect sophisticated attacks. Session anomaly detection(SAD), developed by Cho et al.[8],divides a sequence of web requests
ALGORITHM AND TECHNIQUES The concept of intrusion and anomaly detection has been organized into two approaches where the mechanism such as matching the data logs then formation of data clusters and representation of it in the features matrix has been done. The matching algorithm is as follows. Let input rows of string values P and web layers log text file T be embedded with some alphabet and let P mm, Tnn, where m and n are number of rows and columns respectively and n m and m = n. A twodimensional character matching problem is yo locate input rows of P as a sub-set of rows of P in T is said to be exact, if relation P is included in relation T as a sub-relation such as P U T T, and is defined as approximate, if for some tuples X of relation T, the 2D-dist(P,X) id minimal or 2Ddist(P,X) k. (2D-dist(P,X) k. (2D-dist [3] is a function defining a two-dimensional distance between two relations.) 2D approximate string matching using 2D edit distance k, k N0 means to find all attribute value occurrences of P in T with equal or less than K errors. Let rowi(a) denotes the ith -row of relation a and col i (a) denotes the ith columns of a. Given two rows of string values with same number of attributes,their KS 2D-distance[3] is the sum of the edit distances of the corresponding rows or columns. The KS edit distance is being computed using columns between P and T can be described by the formula given below: H(T,P)={H(T1,P1,)+H(T2,P2)++H(Tn,Pn)} n H(Ti,Pi)= edL(coli(T),coli(P)) In case of web layer log file matching if it has been found that the approximate edit distance of input P is more than the input threshold K then attack is being detected. In such type of attack, a temporary role has been generate to restrict the user in updating the database. In case of attack detection based on value of af, the formation of clusters and labeling the attack clusters has been done. Corresponding packets with respect to the attack clusters has been blocked in such case.
A. Algorithm for attack detection on web log file matching: INPUT: Web layer log text T; input feature text P; OUTPUT: Unmatched text M(),u_match_count; a .[i] Let T be expressed as T={T1,T2,Tn}, where {T1,T2,Tn} are the set of rows and each Ti={s1,s2,sn},where each si is a string of characters.; [ii]Let the input P be P={P1,P2,Pn}, where Pi={s'1,s 2,s n},; Matched_ count=0; Unmatched_row,M(); u_match_count=0; b. for i=1 to m do{ inP*/ c. for j=1 to n do{ T*/ d. if(Pi==Tj)then{ match_count=match_count+I; M()=M()+Tj;}} e. if(match_count==0)then M()=M()+Pi; u_match_count= u_match_count +1; f. match_count=0;} B. Approximate matching: INPUT: M(),T,u_match_count, OUTPUT: aprox_match_count,attack_alarm a. For i=1 to u_match_count do{ b. For i'=1 to n do { /*for all rows in T do */ c. For j=1 to n' do { d. Len=str_len(s'k); /* s'k kth string of ith unmatched rows of M()*/ /*mnumber of rows /*n number of rows in expressed as each row
[iii] Initialize: matched_row,M()= ;
e. Um[]=; f. For q=1 to len do {
follows
g. c =getchar(sk),c'=getchar(s'1); h. if(cc') then { [i] c = c'; /* s'k kth string of ith row of M()*/ [j] Um[j]Um[j]+1;}}} i. HeD[i']MIN(Um[j])/*HeD[i']edit distance of ith row of M()*/ j. +1; if((HeD[i']) )then aprox_match_countapprox_match_count k. n The sample shot of the log files of the incoming request is shown
if(aprox_match_count<u_match_count)the generic attack alarm;
C. Attack detection: INPUT: Threshold ;
OUTPUT: Attack alarm; [a] compare the value of arrival factor(af) of incoming http service request for some instances I=(I1,I2,In). [b] Generate attack alarm if value of af> . [c] Stop. The detection of alarm is done on the base of the threshold value of the arrival factor. The instances of the input requests are analyzed and if the arrival factor is greater than the threshold value then attack is generated. FIGURES AND TABLES The mechanism for the intrusion and detection rate of the incoming requests would be represented as Figure 1. Anomaly Feature Matrix ( user-based, content-based and page based analysis)
matched with the web log signatures which are already stored as default for the only purpose of security. These features are extracted from the web servers and the databases. If they match each other then these requests are granted and if it does not match then they generate alarms and the characterization of the anomalies is done and visualization using ADAM is done . These kind of requests are blocked and access is not allowed to the databases. Figure 2: Flow diagram of the working phase
Elements included in the feature matrix allow investigation from diverse (eg: ,user -,content- and page based ) analysis .For example ,when analyzing frequency of IP or interval between the successive requests ,each IP (eg: user)must be reviewed in isolation when analyzing validity of user agent or query ,conclusions can be derived based on the content itself without having to compare against past requests. Likewise , some security analysis must be applied on each HTML page. For example anomaly on time taken to serve requests and number of bytes transferred is meaningful only when values are analyzed int the context average value associated with the page.Effectiveness of AFM as a general framework to characterize web attacks becomes apparent only when several anomaly feature elements are combined as shown in the figure 1 .The figure:2 shows the total working involved .Firstly the HTTP request is taken as input from the web layer log file . The requested input has login attempt to access the database . Before allowing the login rights the client that are requesting for the further access is analyzed properly by extracting its features .The features are all the characteristic properties of the requests which are predefined and are concerned standardized. These features are then
VI. CONCLUSION AND FUTURE WORK Thus we conclude that we have presented an intrusion detection mechanism in which web layer log files are generated and clusters are formed and then labeled data is set either normal or anomalies. The attacks such as denial of services, SOL injection, directory traversed attack are found out by matching the features of anomalies and intrusive requests that are predefined. Then we have visualized the analysis because it is essential in effectively combating sheer complexity and volume of the logs .ADAM is a useful tool to automate AFM-based anomaly analysis, and it provides powerful visual display capability.This is followed by blocking techniques where the anomalies and the intrusions are traced and the request which originated it is stopped from accessing the database. In the near future,our attempt would be analyzing and detecting the newly emerging
attacks which are not familiar and complicated to the user. We may also try to find the exact location from where the anomalous request has been obtained. REFERENCES 1. C. Kruegel, G. Vigna, "Anomaly Detection of Web-based Attacks", Proceedings of the 10th ACM Conference on Computer and Communication Security (CCS'03),2003, pp.251-261 2. C. Kruegel, G. Virgna, W. Robertson, A multi-model approach to the detection of web-based attacks, Computer Networks: vol. 48, no. 5, pp. 717-738, 2005 3. F. Valeur, D. Mutz, G. Vigna, "A Learning-Based Approach to the Detection of SQL Attacks", Proceedings of the Conference on Detection of Intrusions and Malware and Vulnerability Assessment (DIMVA), Austria, 2005 4. K. Krithivasan and R. Sitalakshmi, "Efficient Two imensional Pattern Matching in the Presence of Errors", Information Sciences, Vol. 43,1987, pp. 169-184.
5. M. Almgren, H. Debar, M. Dacier, "A lightweight tool for detecting web server attacks",In Proceedings of the ISOC Symposium on Network and Distributed Systems Security,2000 6. Qiang and Vasileios Megalooikonomou, A Clustering Algorithm for Intrusion Detection.DEnLab, Temple University 7. Rebecca Bace, Peter Mell, NIST Special publication on Intrusion Detection Systesm, 16th August 2001 8. Sanghyun Cho, Sungdeok Cha, SAD : Web Session anomaly detection based on parameter estimation, Computer & Security. 9. Snort,http://www.snort.org/ 10. Stefan Axelsson. Combining a bayesian classifier with visualisation:Understanding the IDS, Proceedings of the 2004 ACM workshop on Visualization and data mining for computer security, pages 99108, 2004 11. IIS W3C Extended log format, http://www.loganalyzer.net/lognalyzer/w3c-extended.html
Enabling Agile Testing Testing From End to End Continous Integration to Completely Eliminate the Blind Spot
1
K. Karnavel1, J.Santhosh Kumar2, S.Audhavan3,M.ThamaraiSelvan4 Lecturer, 2, 3, 4 P.G Student ,Department of CSE, Anand Institute of Higher Technology.
Abstract A Continuous Integration system is often considered one of the key elements involved in supporting an agile software development and testing environment. As a traditional software tester transitioning to an agile development environment it became clear to me that I would need to put this essential infrastructure in place and promote improved development practices in order to make the transition to agile testing possible. This experience report discusses a continuous integration implementation I led last year. The initial motivations for implementing continuous integration are discussed and a pre and post-assessment using Kent beck and Martin Fowler's "Practices of Continuous Integration" is provided along with the technical specifics of the implementation. The report concludes with a showing of my experiences implementing and promoting continuous integration within the context of agile testing. 1. Introduction Agile Methodology Agile Software Development is the latest emerging trend in Software Industry. More and more organizations are using Agile Software Development to develop their softwares and applications. Agile software development refers to a group of software development methodologies based on iterative development, where requirements and solutions evolve through collaboration between self-organizing cross-functional teams. The main features of Agile Development are:1. Incremental, Iterative, Adaptive Agile Development follows a descriptive approach and builds the system gradually. Typically, it has 2 weeks of iterations, which includes requirements development and testing. Thus it has multiple checkpoints during a project. 2. Regularly delivers business value Work is broken into stories also known as use-cases and each of them is defined with some acceptance criteria. 3. Collaborative Agile Development also members to work in different modules and does not require specialized knowledge. 4. No Backsliding Agile development automatically includes unit testing and continuous integration testing in a test driven development method. To implement in order to start using agile testing techniques. The most significant practices identified are listed below: 1. Define and execute just-enough acceptance tests [1] - This practice allows the customer to define external quality for the team and gives everyone confidence that user stories are complete and functional at the end of the sprint. 2. Automate as close to 100% of the acceptance tests as possible [2] This practice prevents accumulation of technical testing debt
in the form of an ever-growing manual regression test set that requires the team to stop and run the tests. 3. Automate acceptance tests using a subcutaneous test approach with a xUnit test framework [2] - Using an xUnit type framework and our software Application Programmer Interface (API) to automate acceptance tests allows for less-brittle test creation and easier development and maintenance of an automated regression suit. This is compared to Graphical User Interface (GUI) test automation applications. 4. Run all acceptance tests in the regression test suite with the build, daily (at a minimum) [4] - This practice provides rapid feedback to the team if existing functionality has regressed by new code development changes. 5. Develop unit tests for all new code during a sprint [5] - This practice raises internal quality of the software and permits the justenough acceptance testing described in number 1 above. 6. Run all unit tests with every build [4] This practice provides rapid feedback to the team if regressions at the unit level occur with any code changes. 7. Run multiple builds per day [4] This practice allows testing and exercising the latest code and changes throughout the day. It also allows for more frequent integration of developer code and thus quicker feedback into potential integration issues. Again, these practices are what my team initially decided we needed to adopt in order to test early and test often, enabling us to find bugs in-line with development. This would also allow us to fix bugs at a cheaper cost before the developers moved on to another development task during the course of a sprint or project. But we had an immediate problem to address. 2. The Problem
The main problem was that our team didnt have an automation framework of any kind in place to implement several of the practices that would allow us to test in an agile way. Further, as we identified and discussed specific practices we were reminded of other areas of technical debt our team carried such as our manual build process. This process required three passes to build without error and no unit and functional test automation existed. These were problems we needed to contend with in order to begin implementing the practices we had identified. 3. The Solution I discovered, through conversations with other agile test practitioners and additional research, a common element among teams already successfully implementing agile testing techniques: continuous integration. A continuous integration implementation seemed to be the solution to our lack of an automation framework. Now we needed to find out more about continuous integration so we could build our system. 4. What is Continuous Integration? Continuous Integration describes a set of software engineering practices that speed up the delivery of software by decreasing integration times. It emerged in the Extreme Programming (XP) community, and XP advocates Martin Fowler and Kent Beck first wrote about continuous integration eight years ago [3]. Martin Fowler defines continuous integration as: a software development practice where members of a team integrate their work frequently, usually each person integrates at least daily - leading to multiple integrations per day. Each integration is verified by an automated build (including test) to detect integration errors as quickly as possible. [4]. What does a continuous integration implementation look like? Typically a continuous integration framework provides for automated source repository change detection. When changes to the repository are detected (e.g. when developers check in new code) a
potential chain of events is put into motion. A typical first step in this chain of events is to get the latest source code and compile it (or for interpreted code perform some other checks like applying a form of lint). If compilation does not fail, then unit tests are executed. If unit testing does not fail, the application is deployed to a test environment where automated acceptance tests can be executed. If automated acceptance tests do not fail, the build is published to a public location for the team. The team is then notified (e.g. via email or RSS) and a report generated for the activities that included what and how many tests were run, the build number, links to results and the build, etc. 5. Continuous Integration Implementation Our software development environment consisted basically of Windows .NET C# applications. Thus several of our choices for implementing continuous integration were heavily influenced by what would work in this environment. 5.1. Layout The basic layout of our continuous integration implementation consisted of two physical machines and two virtual machines (VM). One physical machine hosted our virtual build machines. The other physical machine hosted our virtual test machines. There is no reason that, given enough memory and CPU power on a single virtual host, we could not have chosen to host both build VMs and test VMs on the same physical machine. However we decided to separate the build VMs and test VMs to simplify management and restrictions in computing resources; our build VM host couldnt support the additional load of hosting our test VMs. 5.2. Software Tools Used This is a list of the software tools used to implement our continuous integration. Automated Build Studio Automated Build Studio (ABS) by AutomatedQA is an automated build and continuous integration Windows application. A license had been
purchased for this application and was already being used to run manual builds on a few projects. Thus, while open source alternatives could have been used, we chose to continue using this tool as it was easy to configure and meant less tool-shift for our teams. Software Test Automation Framework (STAF) STAF provides a service to send and receive remote commands between build and test machines such as copying files, executing programs, getting directory listings, etc. This tool was chosen as tests showed it to be a fairly robust way to handle communication between machines. The fact that it is a mature (sevenyears-old) and open-source project developed by an IBM group helped us make the decision. Visual Studio 2008 Microsoft Visual Studio 2008 development IDE includes a compiler that was already being used to build our applications. Surround Seapine Surround SCM repository and versioning system is a source control system was already in place. VBScript Microsoft VBScript was used to write a helper script to reset the test VM just before each new application install and test iteration. VBScript is natively supported on Windows platforms and thus is a good boot strap language for Windows. C# custom helper applications C# was used to create a few tools used to distribute, execute, and collect unit and acceptance test results. C# was chosen because we were most familiar with it (we were using it to develop most of our products), its very powerful, and the .NET environment required to run C# applications was already installed on the build machine where they would need to run. The custom helper applications we wrote and used were: Test Distributor discovers all NUnit project files in a specified product directory tree, and then copy all necessary .dlls and other files used by the project over to the product
installation directory on the test VM for execution later Test Runner runs each NUnit project in the product installation directory of the test VM. Test Results Processor processes XML test results for each NUnit project test run, aggregates results (summary, list of failures, failure details) and write to html for later inclusion in build email. NUnit NUnit is a test framework for all .Net languages used to execute unit and acceptance tests. We chose NUnit because the developers were already using it to develop unit tests and we also found that we could use it to develop and execute acceptance tests as well. 5.3. Tool Mapping Here is how our tool set mapped to the layout described earlier: 1. Build VM a. Automated Build Studio b. STAF c. Visual Studio 2008 d. Surround e. NUnit f. VBScript g. C# helper apps 2. Test VM Host a. STAF b. Virtual Server 2005 c. VBScript 3. Test VM a. STAF b. NUnit 5.4. Putting It All Together Our continuous integration implementation works like this2: 1. The ABS service runs polling the Surround source repository for changes/check-ins 2. If a change is detected, a new build is started performing the following actions: a. Refresh source code on build machine for project/solution to be built b. Build the application (ABS using VS2008) (STAF) VM
c. Prep the test VM for product installation and testing; reset virtual machine and ping until it comes back online (VBScript helper script called via STAF) d. Copy installation files to test VM e. Install application-under-test on test
(STAF) f. Discover and copy all Unit and Acceptance tests to test VM (custom C# helper application, uses STAF) g. Execute tests on test VM (custom C# helper application, uses STAF) h. Copy test result .xml file from test VM to build machine (STAF) i. Process test results into results email j. Send email with test results (PASS or FAIL with details of failures), link to location of build, and build logs 6. Assessing the Teams Continuous Integration Practices Its worth noting that Martin Fowler has put forth 10 Practices of Continuous Integration.[4] These practices help make continuous integration implementations go smoothly. Or put another way, trying to implement a continuous integration system without these practices could prove to be a rocky experience. The following list compares these practices with where our team was before and after we implemented continuous integration, providing a preand postassessment of the practices. 1.Maintain a Single Source Repository Before CI: Seapine Surround SCM source repository After CI: No change 2. Automate the Build Before CI: Some partial automation; still very manual. Heterogeneous build environments After CI: Yes, with Automated Build Studio (ABS) 3. Make Your Build Self-Testing
Before CI: Not self testing After CI: NUnit framework acceptance tests
unit
and
7. Assessing Our New Agile Testing Practices and Agile Testing Capabilities With our continuous integration system in place our team was now positioned to adopt all of the agile development techniques discussed earlier. And while not every practice relied on the continuous integration system, a few important ones did. These are identified in Table 3, Agile Practice Assessment after Implementing Continuous Integration. Lets do a check up on where our team was now in regard to these agile practices. Practices marked with a were enabled using our new continuous integration system: 1. Define and execute just-enough acceptance tests - We made acceptance test definition a required task during sprint planning. The developers and customer both grew to like this as it gave them visibility into our test coverage and confidence that we were testing the right things. 2. Automate as close to 100% of the acceptance tests as possible - We now tried to automate as close to 100% acceptance tests as possible. But this process takes time to fully implement. There were still lots of legacy manual tests around but the awareness and commitment to automate them going forward were what we focused on. The net result was that we dramatically slowed our accumulation of technical test debt in the form of manual test cases. 3. Automate acceptance tests using a subcutaneous test approach with an xUnit test framework - We now used the NUnit test framework to automate our acceptance/functional testing using the API of the application. A nice side benefit is that our new tests going forward were all code, could be versioned in the repository, and could run in an automated fashion with the build.
4. Everyone Commits Every Day Before CI: Unknown, varied probably After CI: We can only hope 5. Every Commit Should Build the Mainline on an Integration Machine Before CI: No, was not happening Yes, ABS's Continuous Integration Tasks helped us do this 6. Keep the Build Fast Before CI: Multi-pass builds, unordered dependencies After CI: 10 15 min; refactoring needed sooner than later 7. Test in a Clone of the Production Environment Before CI: Yes, but not automated After CI: Using clean virtual machine test clients to install and test 8. Make it Easy for Anyone to Get the Latest Executable Before CI: Not all projects using the common build repository. Some private file share locations for production code After CI: All products building to common location now. Build mail contains link to new build location 9. Everyone can see what's happening Before CI: Limited to ad-hoc emails, no web, no reporting, different project worked differently After CI: Use ABS's web interface to see the progress of builds, and email build and test status 10. Automate Deployment Before CI: Not being done After CI: Yes, automatically deploy the build, then test it In summary, all continuous integration practices were either maintained, if existing, or improved upon.
4. Run all acceptance tests in the regression test suite with the build, daily (at a minimum) Our NUnit acceptance tests now ran with a daily build. 5. Develop unit tests for all new code during a sprint - Developers started putting more emphasis on developing unit tests in NUnit for new code during the same sprint. This was a gradually improving process encouraged with the ease with which unit tests could now be run with the build. 6. Run all unit tests with every build Our NUnit unit tests now ran with every build. 7. Run multiple builds per day Builds were now started automatically when changes to the source repository were checked in. Manual builds could also be initiated. As you can see in the list above, at least at a base level, all of the agile practices we set out to put in place. These practices started paying off immediately in terms of our ability to develop and test in parallel. For example, now that unit tests were run with every build the team saw immediately when new code broke existing code. Moreover, I was able to immediately begin working with developers to automate acceptance tests as they were coding the stories. While the practices were still pretty new to us, it felt like a big win to have an environment that would allow us to improve how we worked together. We began finding bugs in the APIs I was using to automate the acceptance tests with. We also began finding other technical debt we needed to pay in order to keep going forward, like the lack of a command line installation for our applications or a programmatic way to call data validation to inputs that lived outside of the application GUI. 8. Retrospective The expedition from recognizing we needed to test in a different way to accommodate agile development to implementing the agile practices and continuous integration that would support it has been educational and rewarding. First,
learning how to change how I tested to operate in sync with developers was one of the biggest discoveries Ive made in my testing career. Testers, developers, and all other stakeholders Ive worked with have always wanted to be able to do this, but it wasnt exactly clear how to do it. Testing in parallel with development has overcome the traditional disjointed relationship between test and development. Second, embracing the idea that we need to fully automate acceptance tests seemed both exciting and intimidating. The idea was very appealing but I also worried that it would be too difficult to scale. The subcutaneous test approach using xUnit test frameworks was the answer that finally made sense and seemed doable. Of course it required that I code much more and work with the developers to learn the API of our product. Third, its not hard to convince developers that automating tests is a good idea. But it was hard to convince them that we needed to go through our implementation hump of pain to get the pieces in place that would allow us to have continuous integration. I worked on a small team and we didnt seem to have any extra time for me to work on the infrastructure we needed. I ended up working on the proof-ofconcept during my lunch, sometimes in the evening or weekend, and during other down times. When I finally got things to a point where builds were automatically kicking off and the test VM launching, developers became engaged and excited. I identified the remaining tasks we would need to tie it all together. The team agreed to add them as sprint backlog items. There is probably a more savvy way to approach the buy in for developing a continuous integration system, but I didnt know how else to approach it other than through prototype and demonstration. What I learned was developing a continuous integration system is as much the responsibility of the developers as it is the testers. Finally, thinking of acceptance test development within the context of a continuous integration system has been a major shift. Its hard to think of going back to doing it the old way. I cant imagine working on a team that is not doing
agile testing, let alone agile development. I, like many other agile testers, believe it makes much more sense and costs less to push testing and quality into the development cycle rather than to add it afterward. It adds value immediately. As I move to new teams, and begin working with them, my first step to implement agile testing approaches to accommodate agile development will be to consider implementing a continuous integration system to support it. 9. References [1] eXtreme Rules of the Road: How a tester can steer an eXtreme Programming project toward success, Lisa Crispin, STQE Jul/Aug 2001
[2] Testing Extreme Programming, Lisa Crispin and Tip House, 2003, Addison Wesley [3] Continuous integration, http://en.wikipedia.org/wiki/Continuous_Integr ation [4] Continuous Integration, Martin Fowler, http://www.martinfowler.com/articles/continuo usIntegration. html [5] Code the Unit Test First, http://www.extremepro gramming.org/rules/testfirst.html
Publishing Search Logs A Comparative Study of Privacy Guarantees

S. krishna.narayanan1,N.Selvameena2. 1,2 Department of CSE, Anand Institute of Higher Technology
Abstract
Search engine companies collect the database of intentions, the histories of their users search queries. These search logs are a gold mine for researchers. Search engine companies, however, are wary of publishing search logs in order not to disclose sensitive information. In this paper we analyse algorithms for publishing frequent keywords, queries and clicks of a search log. We rst show how methods that achieve variants of k-anonymity are vulnerable to active attacks. We then demonstrate that the stronger guarantee ensured by -differential privacy unfortunately does not provide any utility for this problem. We then propose a novel algorithm ZEALOUS and show how to set its parameters to achieve (, )-probabilistic privacy. We also contrast our analysis of ZEALOUS with an analysis by Korolova et al. [17] that achieves (,)-in distinguishability. Our paper concludes with a large experimental study using real applications where we compare ZEALOUS and previous work that achieves k-anonymity in search log publishing. Our results show that ZEALOUS yields comparable utility to kanonymity while at the same time achieving much stronger privacy guarantees. Keywords H.2.0.a Security, integrity, and protection < H.2.0 General < H.2 Database Management < H Information Technology and Systems, H.3.0.a Web Search < H.3.0 General < H.3 Information Storage and Retrieval < H Information Technology and Systems
VI.
INTRODUCTION Search engines play a crucial role in the navigationthrough the vastness of the Web. Todays search engines do not just collect and index webpages, they also collect and mine information about their users. They store the queries, clicks, IP-addresses, and other information about the interactions with users in what is called a search log. Search logs contain valuable information that search engines use to tailor their services better to their users needs. They enable the discovery of trends, pat terns, and anomalies in the search behavior of users, and they can be used in the development and testing of new algorithms to improve search performance and quality. Scientists all around the world would like to tap this gold mine for their own research; search engine companies, however, do not release them because they contain sensitive information about their users, for example searches for diseases, lifestyle choices, personal tastes, and political afliations. The only release of a search log
happened in 2007 by AOL, and it went into the annals of tech history as one of the great debacles in the search industry. AOL published three months of search logs of , users. The only measure to protect user privacy was the replacement of userids with random numbers utterly insufcient protection as the New York Times showed by identifying a user from Lilburn, Georgia [4], whose search queries not only contained identifying information but also sensitive information about her friends ailments. The AOL search log release shows that simply replacing userids with random numbers does not prevent information disclosure. Other ad hoc methods have been studied and found to be similarly insufcient, such as the removal of names, age, zip codes and other identiers and the replacement of keywords in search queries by random numbers. In this paper, we compare formal methods of limiting disclosure when publishing frequent keywords, queries, and clicks of a search log. These methods we study vary in the guarantee of disclosure limitations
they provide and in the amount of useful information they retain. We rst describe two negative results. We show that existing proposals to achieve kanonymity in search logs [1],are insufcient in the light of attackers who can actively inuence the search log. We then turn to differential privacy [9], a much stronger privacy guarantee; however, we show that it is impossible to achieve good utility with differential privacy. We then describe Algorithm ZEALOUS2 , developed independently by Korolova et al. and us with the goal to achieve relaxations of differential privacy. Korolova et al. showed how to set the parameters of ZEALOUS to guarantee , indistinguishability [8], and we here offer a new analysis that shows how to set the parameters of ZEALOUS to guarantee , probabilistic differential privacy a much stronger privacy guarantee as our analytical comparison shows. Our paper concludes with an extensive experimental evaluation, where we compare the utility of various algorithms that guarantee anonymity or privacy in search log publishing. Our evaluation includes applications that use search logs for improving both search experience and search performance, and our results show that ZEALOUS output is sufcient for these applications while achieving strong formal privacy guarantees. We believe that the results of this research enable search engine companies to make their search log vailable to researchers without disclosing their users sensitive information: Search engine companies can apply our algorithm to generate statistics that are , -probabilistic differentially private while retaining good utility for the two applications we have tested. Beyond publishing search logs we believe that our ndings are of interest when publishing frequent itemsets, as ZEALOUS protects privacy against much stronger attackers than those considered in existing work on privacy preserving publishing of frequent items/itemset.
VII.
The A simple type of disclosure is the identication of a particular users search history (or parts of the history) in the published search log. The concept of k-anonymity has been introduced to avoid such identications. Denition 1 (k-anonymity [23]): A search log is k- anonymous if the search history of every individual is indistinguishable from the history of at least k other individuals in the published search log. There are several proposals in the literature to achieve different variants of k-anonymity for search logs. Adar proposes to partition the search log into sessions and then to discard queries that are associated with fewer than k different user-ids. In each session the user-id is then replaced by a random number [1]. We call the output of Adars Algorithm a k-query anonymous search log. Motwani and Nabar add or delete keywords from sessions until each session contains the same keywords as at least k other sessions in the search log, following by a replacement of the user-id by a random number. We call the output of this algorithm a k-session anonymous search log. He and Naughton generalize key words by taking their prex until each keyword is part of at least k search histories and publish a histogram of the partially generalized keywords. We call the output a k-keyword anonymous search log.Efcientways to anonymize a search log are also discussed in work by Yuan et al. Stronger disclosure limitations try to limit what an attacker can learn about a user. Differential privacy guarantees that an attacker learns roughly the same information about a user whether or not the search history of that user was included in the search log [9]. Differential privacy has previously been applied to contingency tables [3], learning problems [5], synthetic data generation of commuting patterns and more. Denition 2 (-differential privacy [9]): An algorithm A is -differentially private if for all search logs S and Sdiffering in the search history of a single user and for all output search logs O:
PUBLISHING LOGS SEARCH
1. For each user u select a set su of up to m distinct items from us search history in S. This denition ensures that the output of the algorithm is insensitive to changing/omitting the complete search history of a single user. We will refer to search logs that only differ in the search history of a single user as neighboring search logs. Note that similar to the variants of k-anonymity we could also dene variants of differential privacy by looking at neighboring search logs that differ only in the content of one session, one query or one keyword. However, we chose to focus on the strongest denition in which an attacker learns roughly the same about a user even if that users whole search history was omitted. Differential privacy is a very strong guarantee and in some cases it can be too strong to be practically achieable. We will review two relaxations that have been proposed in the literature. Machana vajjhala et al. proposed the following probabilistic version of differential privacy.
VIII. ACHIEVING PRIVACY
2. Based on the selected items, create a histogram consisting of pairs (k, ck),where k denotes an item and ck denotes the number of users u that have k in their search history su. We call this histogram the original histogram. 3. Delete from the histogram the pairs (k, ck) with count ck smaller than . 4. For each pair (k, ck) in the histogram, sample a random number k from the Laplace distribution Lap()4 ,andadd k to the count ck, resulting in a noisy count: ck ck + k. 5. Delete from the histogram the pairs (k, ck) with noisy counts ck . 6. Publish the remaining items and their noisy counts. To understand the purpose of the various steps one has to keep in mind the privacy guarantee we would like to achieve. Step 1., 2. and 4. of the algorithm are fairly standard. It is known that adding Laplacian noise to histogram counts achieves -differential privacy [9]. However, the previous section explained that these steps alone result in poor utility because for large domains many infrequent items will have high noisy counts. To deal better with large domains we restrict the histogram to items with counts at least in Step 2. This restriction leaks information and thus the output after Step 4. is not differentially private. One can show that it is not even (, )probabilistic differentially private (for < 1/2). Step 5. disguises the information leaked in Step 3. In order to achieve probabilistic differential privacy. In what follows, we will investigate the theoretical performance of ZEALOUS in terms of both privacy
We introduce a search log publishing algorithm called ZEALOUS that has been independently developed by Korolova et al.. ZEALOUS ensures probabilistic differential privacy, and it follows a simple two-phase framework. In the rst phase, ZEALOUS generates a histogram of items in the input search log, and then removes from the histogram the items with frequencies below a threshold. In the second phase, ZEALOUS adds noise to the histogram counts, and eliminates the items whose noisy frequencies are smaller than another threshold. The resulting histogram (referred to as the sanitized histogram) is then returned as the output. Figure 1 depicts the steps of ZEALOUS . Algorithm ZEALOUS for Publishing Frequent Items of aSearchLog Input: Search log S, positive numbers m, , ,
Fig. 1. PrivacyPreserving Algorithm. and utility. The privacy guarantees of ZEALOUS with respect to , indistinguishability and , -probabilistic differential privacy, respectively.
IX. GENERAL STATISTICS
We explore different statistics that measure the difference of sanitized histograms to the histograms computed using the original search log. We analyze the histograms of keywords, queries, and query pairs for both sanitization methods. For clicks we only consider ZEALOUS histograms since a k-query anonymous search log is not designed to publish click data. In our rst experiment we compare the distribution of the counts in the histograms. Note that a k-query anonymous search log will never have query and keyword counts below k, and similarly a ZEALOUS histogram will never have counts below . We choose ,m for which threshold . Therefore we deliberately set k such that k . Figure 2 shows the distribution of the counts in the histograms on a log-log scale. Recall that the k-query anonymous search log does not contain any click data, and thus it does not appear in Figure 2b.We see that the power-law shape of the distribution is well preserved. However, the total frequencies are lower for the sanitized search logs than the frequencies in the original histogram because the algorithms lter out a large number of items.
We also see the cutoffs created by k and . We observe that as the domain increases from keywords to clicks and query pairs, the number of items that are not frequent in the original search log increases. For example, the number of clicks with count equal to one is an order of magnitude larger than the number of keywords with count equal to one. While the shape of the count distribution is well preserved, we would also like to know whether the counts of frequent keywords, queries, query pairs, and clicks are also preserved and what impact the privacy parameters and the anonymity parameter k have. Figure 52shows the average differences to the counts in the original histogram. We scaled up the counts in sanitized histograms by a common factor so that the total counts were equal to the total counts of the original histogram, then we calculated the average difference between the counts. The average is taken over all keywords that have non-zero count in the original search log. As such this metric takes both coverage and precision into account. Hit Probability As expected, with increasing the average difference decreases, since the noise added to each count decreases. Similarly, by decreasing k the accuracy increases because more queries will pass the threshold. Figure 2 shows that the average difference is comparable for the k-anonymous histogram and the output of ZEALOUS. Note that the output of ZEALOUS for keywords is more accurate than a k-anonymous histogram for all values of > .Forqueriesweobtainroughlythesameaverage difference for k and .Forquerypairsthe k-query anonymous histogram provides better utility. We also computed other metrics such as the root- mean-square value of the differences and the total variation difference; they all reveal similar qualitative trends. Despite the fact that ZEALOUS disregards many search log records (by throwing out all but m contributions per user and by throwing out low frequent counts), ZEALOUS is able to
preserve the overall distribution well. Index Caching In the index caching problem, we aim to cache inmemory a set of posting lists that maximizes the hit probability over all keywords In our experiments, we use an improved version of the algorithm developed by BaezaYates to decide which posting
Fig2 : Hit probalility lists should be kept in memory [2]. Our algorithm rst assigns each keyword a score, which equals its frequency in the search log divided by the number of documents that contain the keyword. Keywords are chosen using a greedy bin-packing strategy where we sequentially add posting lists from the keywords with the highest score until the memory is lled. In our experiments we xed the memory size to be 1 GB, and each document posting to be 8 Bytes (other parameters give comparable results). Our inverted index stores the document posting list for each keyword sorted according to their relevance which allows to retrieve the documents in the order of their relevance. We truncate this list in memory to contain at most 200,000 documents. Hence, for an incoming query the search engine retrieves the posting list for each keyword in the query either from memory or from disk. If the intersection of the posting lists happens to be empty, then less relevant documents are retrieved from disk for those keywords or which only the truncated posting list is kept on memory. Figure 3(a) shows the hitprobabilities of the inverted index constructed using the original search log, the k-anonymous search log, and the ZEALOUS histogram (for m ) with our greedy approximation algorithm. We observe that our ZEALOUS histogram achieves better utility than the k-query anonymous search log
for a range of parameters. We note that the utility suffers only marginally when increasing the privacy parameter or the anonymity parameter (at least in the range that we have considered). This can be explained by the fact that it requires only a few very frequent keywords to achieve a high hitprobability. Keywords with a big positive impact on the hit-probability are less likely to be ltered out by ZEALOUS than keywords with a small positive impact. This explains the marginal decrease in utility for increased privacy. As a last experiment we study the effect of varying m on the hit-probability in Figure3(b). We observe that the hit probability for m is above 0.36 whereas the hit probability for m is less than 0.33. As discussed a higher value for m increases the accuracy, but reduces the coverage. Index caching really requires roughly the top 85 most frequent keywords that are still covered when setting m . We also experimented with higher values of m and observed that the hit-probability decrease at some point.
Fig. 3. Distributions of counts in the histograms.

X.
QUERY SUBSTITUTION
Algorithms for query substitution examine query pairs to learn how users re-phrase queries. We use an algorithm developed by Jones et al. in which related queries for a query are identied in two steps. First, the query is partitioned into subsets of keywords, called phrases, based on their mutual information. Next, for each phrase, candidate query substitutions are determined based on the distribution of queries. We run this algorithm to generate ranked substitution on the sanitized search logs. We then compare these rankings with the rankings produced by the origina search log which serve as ground truth. To measure the quality of the query substitutions, we compute the precision/recall, MAP (mean average precision) and NDG (normalized discounted cumulative gain) of the top-j suggestions for each query; let us dene these metrics next. Consider a query q and its list of top-j ranked substitutions q0,...,q j1 computed based on a sanitized search log. We compare this ranking against the top-j ranked substitutions q0,...,qj1 computed based on the origina search log as follows. The precision of a query q is the fraction of substitutions from the sanitized search log that are also contained in our ground truth ranking:
Our last metric called NDCG measures how the relevant substitutions are placed in the ranking list. It does not only compare the ranks of a substitution in the two rankings, but s also penalizes highly relevant substitutions according to q0,...,qj1 that have a very low rank in q0,...,q j1 . Moreover, it takes the length of the actual lists into consideration. We refer the reader to the paper by Chakrabarti et al. [7] for details on NDCG. The discussed metrics compare rankings for one query.
Note, that the number of items in the ranking for a query q can be less than j.The recall of a query q is the fraction of substitutions in our ground truth that are contained in the substitutions from the sanitized search log:
Fig. 4. Quality of the query substitutions of the privacy-preserving histograms, and the anonymous search log.
MAP measures the precision of the ranked items for a query as the ratio of true rank and assigned rank:
where the rank of qi is zero in case it does is not contained in the list q0,...,q j1 otherwise it is I ,s.t. qi qi. Organized by: Department of Computer Science and Engineering, Anand Institute of Higher Technology, Chennai. www.iconic12.in E-mail: iconic.aiht@gmail.com Page 476
Fig. 5. Coverage of the privacy-preserving histograms for m =1 and m=6 To compare the utility of our algorithms, we average over all queries. For coverage we average over all queries for which the original search log produces substitutions. For all other metrics that try to capture the precision of a ranking, we average only over the queries for which the sanitized search logs produce substitutions. We generated query substitution only for the 100,000 most frequent queries of the original search log since the substitution algorithm only works well given enough information about a query. In Figure 4 we vary k and for m and we draw the utility curves for top-j for j and j .Weobserve that varying and k has hardly any inuence on performance. On all precision measures, ZEALOUS provides utility comparable to k-queryanonymity. However, the coverage provided by ZEALOUS is not good. This is because the computation of query substitutions relies not only on the frequent query pairs but also on the count of phrase pairs which record for two sets of keywords how often a query containing the rst set was followed by another query containing the second set. Thus a phrase pair can have a high frequency even though all query pairs it is contained in have very low frequency. ZEALOUS lters out these low frequency query pairs and thus loses many frequent phrase pairs. As a last experiment, we study the effect of increasing m for query substitutions. Figure 5 plots the average coverage of the top-2 and top-5 substitutions produced by ZEALOUS for m and m for various values of . It is clear that across the board larger values of m lead to smaller coverage, thus conrming our intuition outlined the previous section.
XI.
CONCLUSION
This paper contains a comparative study about publishing frequent keywords, queries, and clicks in search logs. We compare the disclosure limitation guarantees and the theoretical and practical utility of various approaches. Our comparison includes earlier work on anonymity and (,)indistinguishability and our proposed solution to achieve (, )probabilistic differential privacy in search logs. In our comparison, we revealed interesting relationships between indistinguishability and probabilistic differential privacy which might be of independent interest. Our results (positive as well as negative) can be applied more generally to the problem of publishing frequent items or itemsets. A topic of future work is the development of algorithms that allow to publish useful information about infrequent keywords, queries, and clicks in a search log REFERENCES [1] Eytan Adar. User 4xxxxx9: Anonymizing query logs. In WWW Workshop on Query Log Analysis, 2007. [2] Roberto Baeza-Yates. Web usage mining in search engines. Web Mining: Applications and Techniques, 2004. [3] B. Barak, K. Chaudhuri, C. Dwork, S. Kale, F. McSherry, and K. Talwar. Privacy, accuracy and consistency too: A holistic solution to contingency table release. In PODS, 2007. [4] Michael Barbaro and Tom Zeller. A face is exposed for aol searcher no. 4417749. New YorkTimes http://www.nytimes.com/2006/08/09/technolog y/09aol.html? ex=1312776000en=f6f61949c6da4d38ei=5090 , 2006. [5] Avrim Blum, Katrina Ligett, and Aaron Roth. A learning theory approach to noninteractive database privacy. In STOC,pages 609618, 2008.
[6] Justin Brickell and Vitaly Shmatikov. The cost of privacy: destruction of data-mining utility in anonymized data publishing. In KDD, 2008. [7] Soumen Chakrabarti, Rajiv Khanna, Uma Sawant, and Chiru Bhattacharyya. Structured learning for nonsmooth ranking losses. In KDD, pages 8896, 2008. [8] Cynthia Dwork, Krishnaram Kenthapadi, Frank McSherry, Ilya Mironov, and Moni Naor. Our data, ourselves: Privacy via distributed noise generation. In EUROCRYPT, 2006. [9] Cynthia Dwork, FrankMcSherry, Kobbi Nissim, and AdamSmith. Calibrating noise to
sensitivity in private data analysis. In TCC, 2006. [10] Michaela G otz, Ashwin Machanavajjhala, Guozhang Wang, Xi- aokui Xiao, and Johannes Gehrke. Privacy in search logs. CoRR, abs/0904.0682v2, 2009.
A TOOL FOR MEASURING THE SERVICE GRANULARITY IN WEB SERVICES

1
J. Geetha Kishorekumar 1,T. Chitralakshmi, K. Chindhyaa, S. Revathi 2 Assistant Professor,. 2 Final Year, B.Tech, CSE Dept, Sri Manakula Vinayagar Engineering College, Madagadipet, Puducherry.
Abstract Service abstraction plays a major role in designing services in webservices. Service abstraction can be called as granularity. Service granularity also includes the functionality of the service. Service Granularity can be defined as the measurement to identify how broad the interaction between a Service consumer and Service provider must be to meet the need at hand. Granularity also helps in determining the optimality of service. The improper service granularity may lead to service duplication, service maintenance problem etc., Thus the importance of measuring service granularity helps in better performance, reusability and efficiency of service. This paper describes a tool for measuring the service granularity and to check the optimality of services. The metrics used in this paper to determine service granularity are Composite level of service, Functional Richness of service and Interface granularity. Since we have applied metrics for each individual service, granularity can be measured in appropriate way. KEYWORDS: Granularity . 1. Introduction Web Services, UML, Service atomic services. In single interaction, to accomplish a business unit of work, the right granularity should be adopted. The aspects that influence the level of granularity are functionality, flexibility, reuse, complexity, context independence, performance, genericity and sourcing. Defining service granularity is very challenging task which requires considering not only service characteristics but also provisional service types [3]. 2. Service Granularity Service granularity is the one which can be determined by quantity of functionality encapsulated by it. It can be coarse grained or fine grained. The larger the quantity of functionality will leads to coarse grained level of service granularity. The coarse grained level of service reduces the times of service interactive application. Because it encapsulates a lot of business and technology ability in an abstract interface. It is the one which interact with large volume of data and the flexibility is bad. For example, returning the whole catalog entries in a set of categories in online shopping system. Furthermore, when granularity is coarser, the
A service exists of a contract, an implementation, and an interface. The contract contains the informal specification of the service, i.e. its purpose, functionality, constraints and usage [1]. The description of the interface is specified in the service contract. The implementation of the service physically provides the required business logic and appropriate data. These are the elements which influences granularity. Internal and external structural software attributes are influenced by Service granularity [2]. Granularity of Web Services is an important design consideration. The concept of granularity is a relative measure of how broad the interaction between a Service consumer and provider must be in order to address the need at hand. The term service granularity refers to the size and the scope of functionality of a service exposes in webservices. It can be quantified as a combination of the number of components/services composed through a given operation on a service interface as well as the number of
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) composition cost is less because of the fewer number of components and interactions that are required [4]. The lesser the quantity of functionality will leads to fine grained level of service granularity. It is the one which interact with less volume of data and the flexibility is good. For example, an operation to browse to a catalog by item number or item wise. Building a Java program from scratch requires the creation of several fine-grained methods that are then composed into a coarse-grained service that is consumed by either a client or another service. Businesses and the interfaces that they expose should be coarse-grained. Coarse-grained components have high reuse efficiency but low reusability. The quality of the service is directly affected by the size division of the service granularity. The service quality includes many aspects such as flexibility, efficiency [5]. To solve the Service granularity issue, we have to focus not on an individual Service, but rather on overall business processes and how Services might meet the needs of multiple processes in the business [6]. Erl et al[7] suggest to assign coarse grained interfaces to service designated as solution endpoints and allow fine grained interfaces for services confined to predefined boundaries so that interoperability is promoted in coarse grained services and reusability can be more promoted in fine grained services. The various type of design granularity includes service granularity, capability granularity, constraint granularity and data granularity. 3. Related Work A number of literatures have addressed the significance of service granularity in different aspects and their impact in designing the web services. Guidelines are available for measuring appropriate granularity and neither it gives concrete solution and nor a method to decide the right granularity. The service granularity metrics proposed by [8] considers the number of operations within the system and the similarity between them. The author identifies service granularity as design property and also considers parameter granularity. The author of [9] describes granularity metrics as a measure of reusability, business value, context independency and complexity The design issues are the value that is produced by WGLA model is not equivalent to the granularity level and it just measures the appropriateness level of composite service granularity. The value that is produced by this model is not a definite value. However, this approach neither covers important granularity attributes not recommends any criteria to evaluate appropriateness of service granularity Schmelzer [10] addresses that the granularity measurement applies in relation to the services available and number of interactions necessary for satisfying specific goal. Foody[11] suggests some guidelines such that combining small operations or breaking larger operations result in right granularity. This limits the size of the messages exchanged between the service provider and consumer.Feurerlichit [12] describes service granularity based on the data normalization. The author applies Boyce- Codd Noraml Form(BCNF) to determine functionaldependencies which in turn select appropriate level ofservice granularity. It specifies that the excessive use of coarse-grained services results in poor reusability and high level of data coupling. This is contrary with most practioners. They believe that coarse grained services minimize the number of SOAP messages and results in lower communation overheads and less possibility of failure. 4. Service Granularity Metrics The various stages in our project are: 1. Data retrieval 2. Analysis of service from Meta data 3. Evaluation of granularity 4. Report generation-granularity level of given service STAGE 1: Data Retrieval First, the pseudo code has been generated for the given service using the class diagram of Unified Modeling Language (UML). UML is an object oriented approach which will identify services from different prospective like application level. This provides a mechanism to align business and software based aspects using the functionality provided and allows service metrics to play a key role in identifying the optimal service granularity [3]. Data
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) are retrieved from the pseudo code of the given service. Then the retrieved data from the pseudo code which is generated from UML diagram are stored it in database. These stored values are used for calculating various metrics and generation of report. STAGE 2: Analysis of Service from Meta Data The services that are obtained from the Meta data of the UML diagram for the given service are analyzed. the matter of service operations and their signatures, while in the composite one, it is the matter of both service description and the involved steps that are executed in terms of predefined control flow. A Composite service structure aggregates smaller and fine-grained services. The hierarchical service formation is characteristically known as coarsegrained entity that encompasses more business or technical processes. Composite services may aggregate atomic or other composite services [13].
Functional Richness of Services While calculating functional richness, the main parameters to be calculated are function count and RUD functions of service. Functionality can be measured rom the function points in the given services. Function point is a measure of software size that uses logical functional terms business owners and users more readily understand [14]. The parameters used in calculation of function count are as follows. The parameters are used in calculation of function count (1) Data Functions - Internal Logical Files (ILFs) (2) Data Functions - External Interface Files (EIFs) (3) Transaction Functions - External Inputs (EI's) (4) Transaction Functions - External Outputs (EO's) (5) Transaction Functions - External Inquiries (EQ's) Internal Logical Files An ILF is a user-identifiable group of logically related data or control information maintained within the boundary of the application. The primary intent of an ILF is to hold data maintained through one or more elementary
STAGE 3: Evaluation of Granularity From the meta data the following data are grouped .Service Granularity and service optimality can be measured from various parameters: Number of atomic services in service, Composite level of service , Number of CRUD functions in a service, Function count of service, Functional richness of service, Input parameter granularity, Output parameter granularity, Interface granularity. Since the metrics are going to be applied on each individual service, granularity level of the service will be more appropriate. The evaluation of correct granularity level of service will lead to better performance, reusability and efficiency of service. Composite Level of Service: This metric fully concentrate on atomic services. Atomic services are the one which cannot be further split into sub-service. In atomic service granularity is
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) processes of the application being counted. Examples of ILFs Samples of things that can be ILFs include: 1. Tables in a relational database. 2. Flat files. 3. Application control information, perhaps things like user preferences that are stored by the application. 4. LDAP data stores.
Table-2: Calculation of External Input External Output An external output (EO) is an elementary process that sends data or control information outside the application boundary. The primary intent of an external output is to present information to a user through processing logic other than, or in addition to, the retrieval of data or control information. The processing logic must contain at least one mathematical formula or calculation, create derived data maintain one or more ILFs or alter the behavior of the system EO examples include: Reports created by the application being counted, where the reports include derived information.
Table-1: Calculation of Internal Logical Files
External Interface File An external interface file (EIF) is a user identifiable group of logically related data or control information referenced by the application, but maintained within the boundary of another application. The primary intent of an EIF is to hold data referenced through one or more elementary processes within the boundary of the application counted. This means an EIF counted for an application must be in an ILF in another application. External input Allocating FPs to EIs is similar to the process we covered for ILFs and EIFs. However, in this case, instead of doing a lookup based on DET's and RET's to determine a Low/Average/High complexity, the lookup is performed using DET and FTR values. As you'll recall from the earlier definition, an FTR is a "file type referenced", so it can be either an ILF or an EIF.
Table-3: Calculation of External Output External Inquiries An External inquiry (EQ) is an elementary process that sends data or control information outside the application boundary. The primary intent of an external inquiry is to present information to a user through the retrieval of data or control information from an ILF of EIF. The processing logic contains no mathematical formulas orcalculations, and creates no derived data. No ILF is , nor is the behavior of the system altered.
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) Delete. These are the operations which relate with database. This is for counting total number of operations which relate to CRUD function. Functional Richness: ------------- (2) n= Total Number of services FCi= Function count of each services CRUD = Create Read Update Delete On (Si) = Total no of operations in each service Si FR(S) = Functional Richness which lies between 0 and 1 closer to and above the value 1, the service is functionally twice. Interface Granularity The granularity concept refers to the number of service operations or their signatures which are related to service description that is interface granularity [9]. Interface granularity is the one which can be calculated by fully based on number of input parameters and number of output parameters involved in a service.
Examples of EQs include: 1. Reports created by the application being counted, where the report does not include any derived data. 2. Other things known as "implied inquiries", which unfortunately, are a little out of scope for this paper.
Service Granularity The average of metrics such as composite level of service, functional richness of service and interface granularity will give the service granularity. It always lies between 0 to 1.
CRUD function The word CRUD denotes create, Read, Update and STAGE 4: Report Generation
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) Granularity for different services are calculated and generated as a report. Comparison of different services can be indicated using a chart diagram. 5. Evaluation Tool for Service Granularity Step1: Specify the Java Folder of UML diagram and the Service Name The UML Class diagram can be converted to java code using any conversion tool. In our implementation, we have used Net beans module to covert class diagram to java code. The code will be generated in any specific folder. That folder name is given as input for our proposed tool. The code generated contains some java1, java2 extensions temporary files. Thus our tool separates java files alone and sends to next stage for analyse.
Step 2: To get Composite Level of Service The next stage computes composite of service. Number of atomic services can be determined using minimum number of operations in class diagram. Some class contains only member variables and thus they cannot be further divided. Those classes comprises atomic services. Since class diagram is drawn for various sub-services, they come under total number of services. Using these two parameters, composite level is determined.
Step 3: To get Functional Richness of Service Function count can be determined by following parameters. External input specifies the number of input values given to each service. This can be calculated from the input parameters in operations of each sub-service. External output specifies the number of output values passed from service to user. This can be determined by the return type of each operations. External inquires can be determined by matching the operations names with set of predefined terms such as search, browse etc., CRUD functions can be calculated by services database operations. Each input, output, inquiry operationshas an effect on database. Total number of services is same as calculated for Composite Level.
Step 4: To get Interface Granularity of Service
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) Interface Granularity can be calculated from input, output parameters of each operations in service. It also includes the weightage value for datatype of parameters. For eg., we have used the value 0 for void, 0.25 for primitive datatype, 0.5 for user-defined datatypes. The input and output parameter specifies the count of total input and output parameters of all the sub-services respectively. Input granularity is calculated using number of input arguments and weightage value for arguments datatype. Similarly Output granularity can be determined using return type and return value. Interface granularity is the sum of input and output granularity.
Step 6: Report Generation
6. Future Work In our proposed work, we have used class diagrams to measure service granularity. We can also use other UML diagrams such as sequence diagram, state diagram, collaboration diagram etc., which will also be available at design stage of services. Then we have taken few metrics to evaluate service granularity to determine optimality of services. This can be further extended by including various other metrics techniques such as coupling, autonomy and reusability. In Coupling most of the large systems are further divided into independently smaller subsystems. The main logic begin this is to improve security, economy and performance. In tightly coupled systems, the services are occurring as larger systems. Because of which there may occur larger inter dependencies .The main disadvantage occur with this is that the failure of individual component will collapse whole system. Any change in the loosely coupled web services will not bring any change in its functionality. If the listening and requesting web service will not trust one another means, then security and test standards must added over it. The various coupling techniques available are Dynamic coupling measures , evolutionary and logical coupling , coupling measures based on information entropy approach , coupling metrics for specific types of software applications like knowledge based systems, and more recently systems developed using aspect-oriented approach. Autonomy
Step 5: To get Service Granularity of Service Service Granularity can be determined by taking average of Composite Level, Functional Richness and Interface Granularity of services. Service Granularity helps in determining the optimality of services. For business service, service granularity value of 0.75 to 1 determines the optimal level.
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) is one in which a particular a type of services will give a particular operations to be done. For example, storing, retrieving and modifying a data. If any other service wants to access those data then if will give theresource for accessing. But the resource is owned by withthe previous service. Service reusability plays an important role in reducing the cost of designing a webservice. In IT management service reusability is very important. The analysis towards reusability of services is needed in service orientation. 7. Conclusion While measuring the quality of service, granularity plays an important role. Thus in our proposed approach, service granularity can be measured from important attributes such as number of atomic services, composite level, Function count, number of CRUD function, functional richness , Input granularity, output granularity and interface granularity. Different services are taken into account and finally comparison is made to determine reusability of the services. The result of this comparison is to find the more granular version of the services. From the granularity measurement, we can able to detect the services are reusable or not. REFERENCES [1] Dr. Maya Daneva, Prof. Dr. Jos van Hillegersberg, Ir. Lucas Osse, and Ir. Piet Adriaanse, Service Granularity in SOA Projects: A trade-off Analysis, Claudia Steghuis, MSc Business Information Technology, University of Twente June 28, 2006 [2] Saad Alahmari, Ed Zaluska, David C De Roure, A Metrics Framework for Evaluating SOA Service Granularity, on 2011 IEEE International Conference on Services Computing. [3] Saad Alahmari and Ed Zaluska, Optimal Granularity for Service-Oriented Systems (Extended Abstract). [4] Raf Haesen, Monique Snoeck, Wilfried Lemahieu, and Stephen Poelmans, On the Definition of Service Granularity and its Architectural Impact, funded by the KBC-Vlekho-K.U.Leuven research chair on Service and Component Based Development sponsored by KBC Bank & Insurance Group. [5] Xie Zhengyu, Dong Baotian, and Wang Li, Research ofService Granularity Base on SOA in Railway Information Sharing Platform, in Proceedings of the 2009 International Symposium on Information Processing (ISIP 09) Huangshan, P. R. China, August 21-23, 2009, pp 391-395. [6] Jason Bloomberg, How to Define a Business Service, the Art of Science of Service Granularity, November 2007, Zapthink. [7] Naveen Kulkarni, Vishal Dwivedi, The Role of Service Granularity in A Successful SOA Realization A Case Study, 2008 IEEE Congress on Services 2008 - Part I. [8] B. Shim, S. Choue, S. Kim, and S. Park, A Design Quality Model for Service-Oriented Architecture, 15th Asia-Pacific Software Engineering Conference, 2008. [9] A. Khoshkbarforoushhaa, R. Tabeinb, P.Jamshidia, F. Shamsa, Towards a Metrics Suite for Measuring Composite Service Granularity Level Appropriateness, 2010 IEEE 6th World Congress on Services. [10] R. Schmelzer, (2007, August 3) The service granularity matrix [Online] [11] D. Foody, (2005, August 13), Getting Web Service Granularity right [Online] [12] Feuerlichit, G., 2006, Service Granularity Considerations Based on Data Properties of Interface Parameters, International Journel of Computer Systems Science and Engineering, Special Issue: Engineering Design and Composition of ServiceOrientation Applications, ISSN 0267 6192. [13] Spyridon Antakis, Security Service Granularity, October 19, 2008. [14] Takuya Uemura, Shinji Kusumoto, and Katsuro Inoue, Function Point Measurement Tool for UML Design Specification.
An Efficient K-Means Algorithm For Clustering Categorical Data

1,2,3
R.Tamilselvan1 , Dr.C.Palanisamy2, K.dinesh kumar 3 Vivekanandha Institute of Engineering and Technology for Women, Elayampalayam , Namakkal .
Abstract This paper proposes a new k-means algorithm of categorical data clustering (CDC) and cluster ensemble (CE) have long been considered as link-based approach of high dimensional data as separate research and application areas. The main focus of this paper is to investigate the commonalities between these two problems and the uses of these commonalities for the creation of new clustering algorithms for categorical data based on cross-fertilization between the two disjoint research fields. Although attempts have been made to solve the problem of categorical data clustering via cluster ensembles, with the results being competitive to conventional algorithms. It is observed that these techniques unfortunately generate a final data partition based on the incomplete information. The underlying ensemble information matrix presents only cluster data point relations, with many entries being left unknown. The process of grouping into high dimensional data into clusters is not accurate and perhaps not up to the level of expectation when the dimension of the dataset is high. The performance issues of the data clustering in high dimensional data it is necessary to study issues like dimensionality reduction, redundancy elimination, subspace clustering, coclustering and cluster ensemble for clusters are to analyzed and improved. The paper presents an analysis that suggests this problem degrades the quality of the clustering result, and it presents a new link-based approach, which improves the conventional matrix by discovering unknown entries through similarity between clusters in an ensemble. In particular, an efficient link-based algorithm is proposed for the underlying similarity assessment. Experimental results on multiple real data sets suggest that the proposed link-based method almost always out performs both conventional clustering algorithms for categorical data and well-known cluster ensemble techniques of high dimensional data. measure a distance (e.g., euclidean) between feature vectors [2], [4]. An example of categorical attribute is sex= {male, female} or shape= I. INTRODUCTION clustering typically groups data into sets in {rectangle, square, shape,.}. As a result, many such a way that the intra-cluster similarity categorical data clustering algorithms have been is maximized while the inter-cluster introduced in recent years, with applications to similarity is minimized. A data clustering is interesting domains such as protein interaction data. one of the fundamental tools we have for The initial method was developed in [3] by making understanding the structure of a data set. It use of Gowers similarity coefficient. Following plays a crucial, foundational role in machine that, the k-modes algorithm in [1] extended the learning, data mining, information conventional k-means with a simple matching and a frequency-based retrieval and pattern recognition. Clustering dissimilarity measure method to update centroids (i.e., clusters aims to categorize data into groups or representative). clusters such that the data in the same cluster are As a single-pass algorithm, Squeezer [4] make more similar to each other than to those in s use of a pre specified similarity threshold to different clusters. Many well established determine which of the existing clusters (or a clustering algorithms, such as k- means [4], new cluster) to which a data point under have been designed for numerical data, whose examination is assigned. It is a hierarchical inherent properties can be naturally employed to clustering algorithm that uses the Information
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) Bottleneck (IB) framework to define a distance measure for categorical tuples. The concepts of evolutionary computing and genetic algorithm have also been adopted by a partitioning method or categorical data, i.e., GAClust [11]. Cobweb [12] is a model-based method primarily exploited for categorical data sets. In high-dimensional data is a phenomenon in real-world data mining applications. The total number of unique terms in a text data set represents the number of dimensions, which is usually in the thousands. High-dimensional data occurs in business as well. In a subspace clustering, each cluster is a set of objects identified by a subset of dimensions and different clusters are represented in different subsets of dimensions. The underlying ensemble information matrix presents only cluster-data point relationships while completely ignores those among clusters. As a result, the performance of existing cluster ensemble techniques may consequently be degraded as many matrix entries are left unknown. This paper introduces a link-based approach to refining the aforementioned matrix, giving substantially less unknown entries. A link-based similarity measure [24], [25], [26] is exploited to estimate unknown values from a link network of clusters. This research uniquely bridges the gap between the task of data clustering and that of link analysis. It also enhances the capability of ensemble methodology for categorical data, which has not received much attention in the literature. In addition to the problem of clustering categorical data that is investigated herein, the proposed framework is generic such that it can also be effectively applied to other data types. The rest of this paper is organized as follows. In section II, Project Related work for Clustering categorical data, cluster ensemble and Consensus function. In section III, we give A link based Approach in k-means algorithm. The Experimental results are given in section IV. This paper concludes with section V. II. PROJECT RELATED WORK A. Clustering Categorical Data K-means algorithms have been proposed in recent years for clustering categorical data. In the problem of clustering customer transactions in a market database is addressed. STIRR, an iterative algorithm based on non-linear dynamical systems is presented in [6]. The approach used in [5] can be mapped to a certain type of nonlinear systems. If the dynamical system converges, the categorical databases can be clustered. Another recent research shows that the known dynamical systems cannot guarantee convergence, and proposes a revised dynamical system in which convergence can be guaranteed. While a large number of cluster ensemble techniques for numerical data have been put forward in the previous decade, there are only a few studies that apply such a methodology to categorical data clustering. The method introduced in [7] creates an ensemble by applying a conventional clustering algorithm (e.g., kmodes [3]) to different data partitions, each of which is constituted by a unique subset of data attributes. Unlike the conventional approach, the technique developed in [8] acquires a cluster ensemble without actually implementing any base clustering on the examined data set. In fact, each attribute is considered as a base clustering that provides a unique data partition. In particular, a cluster in such attribute-specific partition contains data points that share a specific attribute value (i.e., categorical label). Thus, the ensemble size is determined by the number of categorical label s, across al l data attributes. The final clustering result is generated using the graph-based consensus techniques presented in [10]. Specific to this so called direct ensemble generation method, a given categorical data set can be represented using a binary cluster association matrix; such an information matrix is analogous to the market-basket numerical representation of categorical data, which has been the focus of traditional categorical data analysis.
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) into connected components by identifying and deleting inconsistent edges, and each sub graph consisting of connected components refers to a cluster [22]. Combination of multiple partitions can be viewed as a data partitioning task itself. Typically, each partition in the combination is represented as a set of labels assigned by a clustering algorithm. The combined partition is obtained as a result of yet another clustering algorithm whose inputs are the cluster labels of the contributing partitions. We will assume that the labels are nominal values. In general, the clustering can be soft, i.e. described by the real values indicating the degree of pattern membership in each cluster in a partition. We consider only hard partitions below, noting however, that combination of soft partitions can be solved by numerous clustering algorithms and does not appear to be more complex. Having obtained the cluster ensemble, a variety of consensus functions have been developed and made available for deriving the ultimate data partition. Each consensus function utilizes a specific form of information matrix, which summarizes the base clustering results. From the cluster ensemble shown in Fig. 1, cluster ensemble architecture such ensemble-information matrix can be constructed [14], [15]. In addition to the benefits outlined above, consensus clustering can be useful in a variety of domains. For example, clustering categorical data (where it is more difficult to dene useful distance metrics) can be considered as a consensus clustering problem where each discrete feature is viewed as a simple clustering of the data. Then the consensus clustering algorithm can be applied to the ensemble of all clustering produced by discrete features of the data set. An example of categorical data is a Movie database where some of discrete attributes are Director, Actor, Actress, Genre, Year, etc. As another example, consensus clustering can be employed in privacy- preserving scenarios where it is not possible to centrally collect all of the underlying features for all data points, but
B. Cluster Ensemble The purpose of cluster ensemble is to build a robust clustering portfolio that can perform as good as if not better than the single best clustering algorithm across a wide-range of data sets. Different clustering algorithm may take a different approach. For example, K-means is to group the data set so that the total Mean Square Error to the center of each cluster is minimum while graph-based partitioning clustering is to partition the graph into K parts based on the minimum edge weight cuts. Thus a cluster ensemble can be used to generate many cluster results using various clustering algorithms and then integrate them using a consensus function to support the various yield stable results.
Fig. 1. Cluster Ensemble Architecture. We present a two phase clustering combination strategy. At the first step, various clustering algorithms are run against the same data sets to generate clustering results. At the second step, these clustering results are combined by an auto associative additive system based on the distance matrix of graph clustering. The diagram below summarizes our approach. In our approach, a distance matrix is first constructed based on the cluster results from each individual cluster in algorithm; these distance matrices are combined to form a master distance matrix [13]. Then a weighted graph is constructed from the master distance matrix and a graph- based partitioning algorithm is applied to the graph for the final clustering results. Graph-based clustering uses various kinds of geometric structure or graphs for analyzing data. Different graphs reflect various local structure or inherent visual characteristic in the data set. Clustering divides the graph
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) transforms the problem of categorical data clustering to cluster ensembles by considering each categorical attribute value (or label) as a cluster in an ensemble. X { x1 ,......... , x N } be a Let set of N data } be a set of A { a categorical points, ,......... , a 1 M governmental agencies have information about individuals { 1 M } be a set of M that they cannot share, but still need to nd attributes, and ,...., partitions. high-quality is generated for a specific clusters of individuals. For such cases, the Each i consensus partition categorical clustering algorithm offers a natural model for similarity technique, clustering attribute a i A . With this formalism, categorical the data maintained in separate sites in a privacydata X can be directly transformed to a cluster preserving manner, that is, without the need for ensemble, without actually implementing any different sites to fully- reveal their data to each base clustering. While single- attribute data other, and without the need for relying on a partitions may not be as accurate as those trusted authority [17]. obtained from the clustering of all data attributes, they can bring about great diversity within an III. A LINK BASED ensemble. Besides its efficiency, this ensemble APPROACH generation method has the potential to lead to a Existing cluster ensemble methods to high-quality clustering result. categorical data analysis rely on the typical Type II (Full-space ensemble): Unlike the previous pairwise similarity and binary cluster-association case, the following two ensemble types are matrices [18], [19], which summarize the created from base clustering results, each of underlying ensemble information at a rather which is obtained by applying a clustering coarse level. Many matrix entries are left algorithm to the categorical data set. For this unknown and simply recorded as 0. study, the k-modes technique [20] is used to Regardless of a consensus function, the quality of generate base clustering, each with a random the final clustering result may be degraded. As a initialization of cluster centers. In particular result, a link- based method has been to a full-space ensemble, base clustering is established with the ability to discover created from the original data, i.e., with all data unknown values and, hence, improve the attributes. To introduce an artificial instability to accuracy of the ultimate data partition [23]. In k-modes, the following two schemes are spite of promising findings, this initial employed to select the number of clusters in framework is based on the data point- data each base clusterings: 1) Fixed-k, point pairwise-similarity matrix, which is highly expensive to obtain. The link-based SimRank [24] that is employed to estimate the k [ ] (where N is the number of data points), similarity N and 2) among data points is inapplicable to a large data set. Let Random-k, k N ]}. , x N } be a set of N data X { x1 {2,....., [ points and ,.........
A.
Creating a Cluster Ensemble Type I (Direct ensemble): Following the study in [29], the first type of cluster ensemble only how the data points are grouped together. Such a situation might arise when different companies or
M } be a cluster ensemble with B. Generating a Refined Matrix ,...., M base Several cluster ensemble methods, both for clustering, each of which is referred to as an numerical ensemble member. Each base clustering and categorical data [27], are based on the returns a set of clusters binary Cluster association matrix. Each entry { C 1 , C 2 ,...., C ki } , such that i j 1 C in this matrix X , where ki j a crisp B M ( x , c l ) {0 , 1} represents association degree i i i ki i 1 i between point and cluster According to .
is the number of clusters in the ith clustering. Fig. 2 shows the general framework of cluster ensembles. Essentially, solutions achieved from different base clustering are aggregated to form a final partition. This meta level methodology involves two major tasks of: 1) generating a cluster ensemble, and 2) producing the final partition, norm ally referred to as a consensus function.
data x i X
cl
Fig. 2 that shows an example of cluster ensemble and the corresponding BM, a large number of entries in the BM are unknown, each presented with 0. Such condition occurs when relations between different clusters of a base clustering are originally assumed to be nil. In fact, each data point can possibly associate (to a certain degree within [0, 1] to several clusters of any particular clustering. These hidden or unknown associations can be estimated from the similarity among clusters, discovered from a network of clusters. Based on this insight, the refined cluster association matrix is put forward as the enhanced variation of the Base Clustering Results original BM. Its aim is to approximate the value Fig. 2. Basic process of cluster of unknown associations (0) from known ensembles. First apply the multiple base ones (1), whose association degrees are clustering to a data set X to obtain preserved within the BM. Note that circle nodes diverse represent clusters and edges exist only when the corresponding weights are nonzero. Many advanced methods extend this
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) clustering decisions M ). V. EXPERIMENTAL RESULTS This section presents the evaluation of the proposed link based method, using a variety of validity indices and real data sets. The quality of data partitions generated by this technique is assessed against those created by different categorical data clustering algorithms and cluster ensemble techniques. ( 1 ...... The experimental evaluation is conducted over nine data sets. The 20Newsgroup data set is a subset of the well known text data collection 20Newsgroups2 , while the others method. Since the number of data points is normally greater than that of attribute values, ROCK is less efficient than LCE. As a result, it is unsuitable for large data sets [23]. Also, the selection of a smooth function that is used to estimate a cluster quality is a delicate and difficult task for average users [27]. I. Parameter Setting In order to evaluate the quality of cluster ensemble methods previously identified, they are empirically compared, using the settings of cluster ensembles exhibited below. Five types of cluster ensembles are investigated in this evaluation: Type-I, Type-II (Fixed-k), Type-II (Random-k), Type-III (Fixed-k), and Type-III (Randomk). The k-modes clustering algorithm is specifically used to generate the base clusterings and Ensemble size (M) of 10 is experimented. The quality of each method with respect to a specific ensemble setting is generalized as the average of 50 runs. The constant decay factor (DC) of 0.9 is exploited with Weighted KM. II. Parameter Setting To fully evaluate the potential of the proposed method, it is compared to the baseline model (referred to as Base hereafter), which applies SPEC to the BM. This allows the quality of BM and RM to be directly compared. In addition, five clustering techniques for categorical data and five methods developed for cluster ensemble problems are included in this evaluation. Details of these techniques are given below. Clustering algorithms for categorical data. Based on their notable performance reported in the literature and availability, five different algorithms are selected to demonstrate. the efficiency of conventional techniques to clustering categorical data: Squeezer, GAClust, kmodes, CL OPE, and Cobweb. Squeezer [9] is a single-pass algorithm that considers one data point at a time. Each data point is either placed in one of the existing clusters if their distance is less than a given threshold, or used to form a
are obtained from t he UCI Machine Learning Repository [18]. Their details are summarized in Table 1. Missing values (denoted as K&D) in these data sets are simply treated as a new categorical value. The 20News-group data set contains 1,000 documents from two newsgroups, each of which is described by the occurrences of 6,084 different terms. In particular, the frequency ( f {0,1, ....., }) that a keyword appears in each document is transformed into a nominal value: Yes if f> 0, No otherwise. Moreover, the data set used in this evaluation is a randomly selected subset of the original data. Each data point (or record) corresponds to a network connection and contains 42 attributes: some are nominal and the rest are continuous. Following the study in [27], numerical attributes are transformed to categorical using a simple discretization process. For each attribute, any value less than the median is assigned a label 0, otherwise 1. Note that the selected set of data records covers 20 different connection classes. These two data sets are specifically included to assess the performance of different clustering methods, with respect to the large numbers of dimensionality and data points, in no other cluster. While such conjecture may hold true for some data sets, it is unnatural and unnecessary for the clustering process [25]. This rigid constraint is not implemented by the LCE
new cluster. GAClust [12] searches for a data partition (referred to as the median partition), which has the minimum dissimilarity to those partitions generated by categorical attributes. Note that the similarity (or closeness) between two partitions is estimated by using a generalization of the classical conditional entropy. A genetic algorithm has been employed to make the underlying search process more efficient, with the partitions being represented by chromosomes. k-modes extends the conventional k-means algorithm technique, with a simple matching dissimilarity measure. The distance is estimated by the number of common categorical attributes shared by two data points. It iteratively refines k cluster representatives, each as the attribute vector that has the minimal distance to all the points in a cluster (i.e., the clusters most frequent attribute values). CLOPE [18] is a fast and scalable clustering technique, initially designed for transactional data analysis. Its underlying concept is to increase the height-towidth ratio of the cluster histogram. This is achieved through a repulsion parameter that controls tightness of transactions in a cluster, and hence the resulting number of clusters. Cobweb [22] is a conceptual clustering method. It creates a classification tree, in which each node corresponds to a concept. Observations are incrementally integrated into the classification tree, along the path of best matching nodes. This is guided by the heuristic evaluation measure, called category utility. A given utility threshold determines the sibling nodes that are used to form the resulting data partition. To consolidate the underlying evaluation, three well known graph-based cluster ensemble algorithms are also examined: CSPA, HGPA, and MCLA [29], [30]. First, the Cluster based Similarity Partitioning Algorithm creates a similarity graph, where vertices represent data points and edges weight represent similarity scores obtained from the CO matrix. Afterward, a graph partitioning algorithm called METIS [16] is used to partition the similarity graph into K clusters. The Hypergraph Partitioning Algorithm constructs a
hypergraph, where vertices represent data points and the same weighted hyperedges represent clusters in the ensemble. Then, HMETIS is applied to partition the underlying hypergraph into K parts with roughly the same size. Unlike the previous methods, the Meta clustering Algorithm generated a graph that represents the No. No. No. of No. of of S. Dataset Attrib of N Dat Attrib utes Clas 1 Breast 68 9 8 2 o. a ute (d) ses 2 Mushroom 8124 2 11 2 Cancer 3 9 Value (K) 3 20Newsgr Poin 1000 6084 12168 2 2 7 ts s (A) 4 Zoo 10 1 3 7 oup (N) 1 6 6 5 Primary 33 1 4 22 Tumor 9 7 2 6 Congressio 43 1 4 2 nal 6 in the 8 ensemble. relationships among5 clusters Votes In this metalevel graph, each vertex corresponds to each cluster in the ensemble and each edges weight between any two cluster vertices is computed using the binary Jaccard measure. METIS is also employed to partition the meta level graph into K-meta clusters. E ach d at a point has a specific association degree to each meta cluster. This can be estimated from the number of original clusters to which the data point belongs, in the underlying meta cluster. B. Experiment Designs Based on the classification accuracy, Table 2 classification of different clustering methods for performance of different clustering techniques over examined data sets. Note that the presented measures of cluster ensemble methods that implement the ensemble Type-II and Type-III are the averages across 50 runs. In addition, a measure is marked N/A when the clustering result is not obtainable. For each data set, the highest five CA-based values are highlighted in boldface. The results shown in this table indicate that the LCE methods usually perform better than the investigated collection of cluster ensemble techniques and clustering algorithms for categorical data. In particular to Type-II and Type-III ensembles, LCE also enhances the performance of k-modes, which is used as base clustering. According to the findings with the 20
Newsgroup dataset, LCE is effective for such highdimensional data, where Squeezer and Cobweb fail to generate the clustering results. Likewise, LCE is also applicable to a large data set such as KDDCup99, for which several cluster ensemble techniques (CO+SL, CO+AL, and CSPA) are immaterial. With the measures of LCE models being mostly higher than those of the TABLE 2
corresponding baseline counterparts (Base), the quality of the RM appears to be significantly better than that of the original, binary variation. As compared to the LCE models that use Type-II and Type- III ensembles (both Fixed-k and Random-k), the LCE with Type-I (or direct) ensemble is less effective. This is greatly due to the quality of base clusterings, which are
CLASSIFICATION OF DIFFERENT CLUSTERING METHODS
S. No .
Ensem LCE Base CO+S CO+A CSP HGB Squeez KCLOP Cobw ble L L A A er modes E eb I 0.85 0.94 0.94 0.652 0.653 0.676 Typ II-Fixed-k 0 4 0.651 0.954 1 0.870 e 1 Breast II0.97 0.66 0.652 0.941 0.83 0.840 0.862 0.828 0.911 0.950 Cancer Random 1 3 0.652 0.953 2 0.850 -k III0.81 I 0.84 0.97 0.652 0.523 0.936 N 0.844 0.518 0.89 0.88 0.522 Fixed-k 2 II-Fixed-k 2 7 3 A 0.535 0.534 0.83 0.716 4 2 Mushroom IIIII0.96 0.66 N 0.80 0.55 0.518 0.678 7 0.540 0.536 0.603 0.518 0.894 Random Random 9 6 A 5 0.536 0.524 0.81 0.742 k III- 2 -k 0.96 0.79 I 0.600 N 0.63 0.89 0.536 0.694 0.539 0.60 0.60 0.600 0.60 0.941 2 Fixed-k 1 A II-Fixed-k 6 0.600 7 9 0 0 0 0.600 0.600 N 3 20Newsgr III0.600 0.600 N II0.600 0.88 0.53 0.60 0.60 0.78 0.600 0.600 N Random A oup A Random 3 9 0 0.600 0.600 0 0.600 A k III- 3 N -k 0.88 0.60 0.600 I 0.72 0.600 0.89 0.60 0.80 0.600 0.881 0.891 0.60 0.80 0.594 A Fixed-k 5 II-Fixed-k 9 0 0 3 1 2 2 0.883 0.899 0.838 III4 Zoo II0.72 0.60 0.60 0.89 0.82 0.92 0.861 0.875 0.826 0.871 0.851 0.832 0.911 Random Random 8 0 0 0 0.891 0.916 6 0.832 k III- 1 -k 0.74 0.60 I 0.87 0.277 0.82 0.292 0.93 0.41 0.886 0.381 0.876 0.61 0.842 0.44 0.44 Fixed-k 0 6 II-Fixed-k 0 7 4 1 5 6 0.308 0.447 5 0.436 III5 Primary II0.88 0.82 0.94 0.45 0.45 0.298 0.428 0.424 0.466 0.436 0.336 0.369 0.47 Random Tumor Random 1 9 7 5 0.294 0.440 0 0.424 k III- 8 -k 0.82 0.93 The five highest 0.49 scores 0.88 of each 0.290 data set 0.414 are highlighted in boldface. Note that not 0.45 0.44 0.407 Fixed-k 1 0 8 applicable results are marked as NA. 3 5 6 III0.45 0.46 0.45 Random respectively. Despite its inefficiency, line single attribute and multi attribute for 3 CSPA has the best 0 3 -k 0.43 0.48 0.43 Type-I and the others 4 0 4 Dataset
performance among assessed ensemble methods. In addition, Cobweb is the most effective among five categorical data clustering algorithms included in this evaluation similar. experimental results are also observed using NMI and AR. C. Parameter Analysis The parameter analysis aims to provide a practical means by which users can make the best use of the link-based framework. Essentially, the performance of the resulting technique is dependant on the decay factor (i.e., DC [0,1]), which is used in estimating the similarity among clusters and association degrees previously unknown in the original BM. We varied the value of this parameter from 0.1 through 0.9, in steps of 0.1, and obtained the results in Fig. 7. Note that the presented results are obtained with the ensemble size (M) of 10. The figure clearly shows that the results of LCE are robust across different ensemble types, and do not depend strongly on any particular value of DC. This makes it easy for users to obtain high-quality, reliable results, with the best evaluation indices. The corresponding details are given in Section II-A of the online supplementary. Single attribute and multi attribute for Type-I and the others outcomes being obtained with values of DC between 0.7 and 0.9. Although there is variation in response across the DC values, the performance of LCE is always better than any of the other clustering methods included in this assessment. Another important observation is that the effectiveness of the link-based measure decreases as DC becomes smaller. Intuitively, the significance of disclosed associations becomes trivial when DC is low. Hence, they may be overlooked by a consensus function and the quality of the resulting data partition is not improved [21], [22]. D.Discussin The difficulty of categorical data analysis is characterized by the fact that there is no inherent distance (or similarity) between attribute values. The RM matrix that is generated within the LCE approach allows such measure between values of the same attribute to be systematically quantified. The concept of link analysis [24], [25] is uniquely applied to discover
the similarity among attribute values, which are modeled as vertices in an undirected graph. In particular, two vertices are similar if the neighboring contexts in which they appear are similar. In other words, their similarity is justified upon values of other attributes with which they co-occur. While the LCE methodology is novel for the problem of cluster ensemble, the concept of defining similarity among attribute values (especially with the case of direct ensemble, Type-I) has been analogously adopted by several categorical data clustering algorithms. Besides these approaches, traditional categorical data analysis also utilizes the market-basket numerical representation of the nominal data matrix [20], [21]. This transformed matrix is similar to the BM, which has been refined to the RM counterpart by LCE. A similar attempt in identifies the connection between category utility of the conceptual clustering (Cobweb) [12] and the classical objective function of kmeans. As a result, the so called market-basket matrix used by the former is transformed to a variation that can be efficiently utilized by the latter. The intuitions of creating this rescaled matrix and the RM are fairly similar. However, the methods used to generate them are totally different. LCE discovers unknown entries (i.e., 0) in the original BM from known entries (1), which are preserved and left unchanged. On the other hand, the method in maps the attributevalue-specific 1 and 0 entries to the unique, standardized values. Unlike the RM, this matrix does not conserve the known fact (1entries), whose values are now different from one to another attribute. Despite the fact that many clustering algorithms and LCE are developed with the capability of comparing attribute values in mind, they achieve the desired metric differently, using specific information models. LCE unquely and explicitly models the underlying problem as the evaluation of link-based similarity among graph vertices, which stand for specific attribute values (for Type-I ensemble) or generated clusters (for Type- II and Type-III). The resulting system is more efficient and robust, as compared to other clustering techniques emphasized thus far. In addition to SPEC, many other classical clustering techniques, k-means and PAM among others, can be directly used to generate the final
data partition from the proposed RM. The LCE framework is generic such that it can be adopted for analyzing other types of data [26]. V. CONCLUSI ON In this paper we presented a new k-means algorithm for highly effective link based cluster ensemble approach to categorical data clustering. It transforms the original categorical data matrix to information preserving numerical variation, to which an effective graph partitioning technique can be directly applied. The problem of constructing the RM is efficiently resolved by the similarity among categorical labels (or clusters), using the k-means algorithm. The empirical study, with different cluster ensemble types, validity measures, and data sets, suggests that the proposed link-based method usually achieves superior clustering results compared to those of the traditional categorical data algorithms and benchmark cluster ensemble techniques. The final clustering result, a graph partitioning technique is applied to a weighted kmeans algorithm graph that is formulated from the refined matrix. The prominent future work includes an extensive study regarding the behavior of other link-based similarity measures within this problem context. Also, the new method will be applied to specific domains, including tourism and medical data sets for breast cancer, 20Newsgroup, primary tumor and zoo. We examined categorical data of random projection for high dimensional data clustering and identified its instability problem. The evaluation of the proposed link based method, using a variety of validity indices and real data sets. The quality of data partitions generated by this technique is assessed against those created by different categorical data clustering algorithms and cluster ensemble techniques. ACKNOWLE DGMENT The authors would like to thank all staff in Department of Information Technology in Vivekanandha Institute of Engineering and Technology for Women, Tiruchengode, Namakkal Dist. India and Dr.R.K.Gnanamurthy Principal Vivekanandha College of Engineering for Women, Tiruchengode, Namakkal, India for providing the
valuable comments and suggestions of online data clustering using k- means algorithm for link based cluster ensemble approach in high dimensional data, data partitioning and suggestions on experiments. REFERE NCES P.Zhang, X.Wang, and P.X. Song, Clustering Categorical Data Based on Distance Vectors, The J. Am. Statistical Assoc., vol. 101, no. 473, pp. 355-367, 2006. [2] J. Grambeier and A. Rudolph, Techniques of Cluster Algorithms in Data Mining, Data Mining and Knowledge Discovery, vol. 6, pp. 303-360, 2002. [3] Z. Huang, Extensions to the K-Means Algorithm for Clustering Large Data Sets with Categorical Values, Data Mining and Knowledge Discovery, vol. 2, pp. 283-304, 1998. [4] Z. He, X. Xu, and S. Deng, Squeezer: An Efficient Algorithm for Clustering Categorical Data, J. Computer Science and Technology, vol. 17, no. 5, pp. 611-624, 2002. [5] D.H. Fisher, Knowledge Acquisition via Incremental Conceptual Clustering, Machine Learning, vol. 2, pp. 139-172, 1987. [6] D. Gibson, J. Kleinberg, and P. Raghavan, Clustering Categorical Data: An Approach Based on Dynamical Systems, VLDB J., vol. 8, nos. 3-4, pp. 222-236, 2000. [7] S. Guha, R. Rastogi, and K. Shim, ROCK: A Robust Clustering Algorithm for Categorical Attributes, Information Systems, vol. 25, no. 5, pp. 345366, 2000. [8] M.J. Zaki and M. Peters, Clicks: Mining Subspace Clusters in Categorical Data via Kpartite Maximal Cliques, Proc. Intl Conf. Data Eng. (ICDE), pp. 355-356, 2005. [9] V. Ganti, J. Gehrke, and R. Ramakrishnan, CACTUS: Clustering Categorical Data Using Summaries, Proc. ACM SIGKDD Intl Conf. Knowledge Discovery and Data Mining (KDD), pp. 73-83, 1999. [10] D. Barbara, Y. Li, and J. Couto, COOLCAT: An Entropy-Based Algorithm for Categorical [1]
Clustering, Proc. Intl Conf. Information and Knowledge Management (CIKM), pp. 582-589, 2002. [11] Y. Yang, S. Guan, and J. You, CLOPE: A Fast and Effective Cheng .C.H, Fu .A.W, and Zhang .Y, Entropy-Based Subspace Clustering for Mining Numerical Data, Proc. Fifth ACM SIGKDD Intl Conf. Knowledge and Data Mining, pp. 84-93, 1999 [12] Clustering Algorithm for Transactional Data, Proc. ACM SIGKDD Intl Conf. Knowledge Discovery and Data Mining (KDD), pp. 682687, 2002. [13] L.I. Kuncheva and S.T. Hadjitodorov, Using Diversity in Cluster Ensembles, Proc. IEEE Intl Conf. Systems, Man and Cybernetics, pp. 1214-1219, 2004. [14] N. Nguyen and R. Caruana, Consensus Clusterings, Proc. IEEE Intl Conf. Data Mining (ICDM), pp. 607-612, 2007. [15] A.P. Topchy, A.K. Jain, and W.F. Punch, Clustering Ensembles: Models of Consensus and Weak Partitions, IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 27, no. 12, pp. 1866-1881, Dec. 2005. [16] B. Fischer and J.M. Buhmann, Bagging for Path-Based Clustering, IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 25, no. 11, pp. 1411-1415, Nov. 2003. [17] C. Domeniconi and M. Al-Razgan, Weighted Cluster Ensembles: Methods and Analysis, ACM Trans. Knowledge Discovery from
Data, vol. 2, no. 4, pp. 1-40, 2009. [18] A.L.N. Fred and A.K. Jain, Combining Multiple Clusterings Using Evidence Accumulation, IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 27, no. 6, pp. 835-850, June 2005. [19] S. Monti, P. Tamayo, J.P. Mesirov, and T.R. Golub, Consensus Clustering: A ResamplingBased Method for Class Discovery and Visualization of Gene Expression Microarray Data, Machine Learning, vol. 52, nos. 1/2, pp. 91-118, 2003.
VANET Technologies, Security Threats and Research Area: A Study

R.Vivadha1, D.Anbarasy2, M.Sivapriya3, R.Vinodha4
1,2,3,4
Department of Networking, Sri Manakula Vinayagar Engineering College, Madagadipet, Puducherry.
Abstract The mode of using wireless communication is referred as the Wireless network. The concept of existing cellular network and forming an Ad-hoc network are the two types of Wireless Communication approaches. The Ad-hoc network is formed by using the nodes with no pre-established for a temporary need. The system that makes communication vehicles (nodes) each other is Vehicular Ad-hoc Networks (VANETs). The aim of this paper is to present a study on VANET and its technologies, research areas in VANET and possible security attacks in it. I. INTRODUCTION infrastructure. Vehicles can communicate and form a network by using a Vehicle-Vehicle Communication (VVC), the base stations (eg. street lights) in streets helps to form and Infrastructure to Vehicle Communication (IVC) and the base stations can communicate each other to form an Infrastructure to Infrastructure Communication (IIC). VANET, vehicles transmit information to the GPS devices or other vehicles without security. Vehicles movement is constrained by the road in VANET where as in MANET nodes move randomly. This is the major advantage of VANET over MANET.
The VANET uses cars as nodes in order to create a mobile network. The nodes can communicate with each other in a short range like 100 to 300 metres of each other as a router or node in wireless mode. The concept of VANET is implemented in Traffic system, Police vehicles and Fire vehicles. The VANET is a growing technology that they deserve, in recent times, the awareness of the commercial and the research institutions. The Vehicular Communication (VC) makes the steps in the research area to enhance the security and the effectiveness of the communication systems. A.Application of vanets The commercial applications of the system cover a wide range of innovative ideas aiding individuals and tourists such as booking a parking place, downloading tourism information and maps for restaurants and gas stations, navigation and route guidance, payment at toll plazas, Internet access and connection to home computers. Some applications of VANETs are: Vehicle collision warning, Security distance warning, Driver assistance, Cooperative driving, Dissemination of road information, Internet access, Map location, Automatic parking, Driverless vehicles, Weather forecasting and Electronic toll collection. B.Nodes in Ad-hoc Network The nodes in the VANET are often mobile nodes and so they are easy to deploy and it has no
Fig. 1 Nodes in Ad-hoc Network
Here, in the above diagram, the outermost nodes are not within the transmitter range of each other. The middle nodes are used to forward packets between the outermost nodes. The middle node is acting as a router and the three nodes have formed an ad-hoc network. The Simulation of car-to-car Messaging is created and explained by Eichler et. all [1]. II. VANET TECHNOLOGIES The dynamic mobility, safety measures in vehicles, streaming communication between vehicles, infotainment, telematics, automotive vehicle tracking and good vehicular communication in VANET is done by using several technologies which is available
like Bluetooth, IRA, ZigBee, WiFi IEEE 802.11p, WAVE IEEE 1609 and WiMAX 802.16. VANETS could be considered as a component of the Intelligent Transport Systems (ITS) and they are expected to incorporate the technologies like cellular, satellite, WiMax and a WiFi like Dedicated Short Range Communications (DSRC) [2]. The ACM is conducting workshop on VANET in the recent years [3]. The other related areas are like Mobile Ad-hoc Network, Wireless Ad-hoc Network and Intelligent Vehicular Ad-hoc Network. IEEE 802.11E standard is integrated to provide vehicular communication to support message differentiation [4]. A. DSRC/WAVE IEEE802.p protocol is basically used for generating very short range messages for long duration. GPS enabled vehicles are equipped with on board units, which can communicate with each other to propagate information through vehicle to vehicle. DSRC/WAVE operates in 5.9 GHz band (U.S) and 5.8GHz band (Japan, Europe) and has 75 MHz bandwidth allocated for vehicle communication, and range is up to 1 Km with vehicle speed of up to 140 Km/h. B. WLAN (WiFi) Wireless Local Area Network is the other possibility for vehicular networks. An IEEE 802.11 transmitter has 250 meters directional coverage range which maintains multihop connectivity in highway and urban region. C. WPAN Wireless Personal Area Networks are used for short range wireless communications (IEEE 802.15/ Bluetooth). The short transmission range is 10-20 meters which restricts the applicability of this technology only in dense area vehicular networks. III. SECURITY THREATS Security should satisfy four goals, it should ensure that the information received is correct(information authenticity), the source is who he claims to be (message integrity and source authentication), the node sending the message cannot be identified and tracked (privacy) and the system is robust. The possible Security attacks are DOS attack Fabrication attack, Alteration attack, Replay attack and Sybil attack.
IV. THE RESEARCH AREAS SETUP FOR VANET VANET is the integration of ad hoc network, WLAN, cellular technology. It is similar to MANET from perspective of node movements. VANET is dissimilar to MANET from the perspective of high rates of nodes movement and highly varying channel distributions. The Research Setup for VANET is done in different countries. In USA, the research projects like DSRC for travel safety management, traveler information, on board entertainment, fuel efficiency, pollution control etc. VSCC (Vehicular Safety Communication Consortium) created by NHTSA (National Highway Traffic Safety Administration) / Automotive OEM to promote V2V safety / Networking. The research projects like ETC, AHS, VICS, FleetNet, AutoNet and the Path. In Europe, C2CCC (Car to car communication consortium) is conceded out and their research projects are like eSafety support, PReVENT project, Networks on wheels project, COMeSafety projects etc. The Internet ITS consortium is organized in Japan. A. Main Research Topics Efficient broadcasting algorithms are essential for delivery of safety and routing messages. Routing protocols that rely on GPS were introduced [5]. However these protocols still require further investigations and their throughput, stability and capability to work when few cars are on the road as well as in congested areas are of concern. Some of the suggested research areas in VANET are Mobility Modelling, Scalability Modelling, Efficient Channel Utilization, Security and privacy issues [6]. V. CONCLUSIONS AND FUTURE WORK With advancement of wireless technology, VANET becomes a reality now. Vehicular Ad Hoc Networks is an emerging and promising technology, this technology is a fertile region for attackers, who will try to challenge the network with their malicious attacks. In this paper several ongoing research projects, VANET technologies and Security threats are discussed in order to resolve the issues and
constraints related to high mobility, predictability of varying path and channel distribution. In the future work, the work will be carried out to expand about the certificates of the safety messages, how to be created, discarded, and verified and test it by simulation. VANET would play a crucial role in safety, communications and entertainment processes in future. VI. REFERENCES [1] Eichler, Stephan, Ostermaier, Benedikt, Schroth, Christoph, Kosch, Timo, Simulation of Car-to-Car Messaging: Analyzing the Impact on Road Traffic, IEEE Computer Society, 2005 [2] Yue Liu, Jun Bi, Ju Yang, Research on Vehicular Ad Hoc Networks, Chinese Control and Decision Conference (CCDC), 2009. [3]http://www.sigmobile.org/workshops/vanet2010 [4]http://en.wikipedia.org/wiki/Vehicular_adhoc_network [5] Sun Xi, Li Xia-miao, The Study of the Feasibility of VANET and its Routing Protocol, IEEE, 2008. [6] Jie Luo, Xinxing Gu, Tong Zhao, Wei Yan, A Mobile Infrastructure Based VANET Routing Protocol in the Urban Environment, IEEE, 2010.
EFFECTIVE MODIFICATION OF EXACT CLONES USING CLONE MANAGER

E. Kodhai1, M. Subathra2, I. Bakia Ranjana3, M. Banupriya4
1
Associate Professor, 2,3,4B.Tech (Final year), Department of IT, Sri Manakula Vinayagar Engineering College, Pondicherry.
ABSTRACT Reuse approach of existing code is called code cloning. The tendency of cloning not only produces code that is difficult to maintain, but also introduce subtle errors. The existing system mainly focuses on identifying a similar cloned part of a program and finding the same in other parts of the program. The previous works doesnt figure out any modification carried out in the identified clones. We put forth the idea for modifying the identified code clones of type 1(Exact clones). Clone manager is a tool that is used here for retrieving the type 1 clones from the input file and the retrieved clones are highlighted in the source code. On the retrieved clones, modification is performed which gets automatically updated in all the other similar code fragments and thus helps in reducing the code complexity and programmers effort. This entire process is termed as Bug thirster. 1. INTRODUCTION similar. There are several other factors such as performance enhancement and coding style because of which large systems may contain a significant percentage of duplicated code. There is also accidental cloning, which is not the result of direct copy and paste activities but by using the same set of APIs to implement similar protocols. The literature on the topic has described many other situations that can lead to the duplication of code within a software system. Code cloning is found to be a more serious problem in industrial software systems. In presence of clones, the normal functioning of the system may not be affected, but without countermeasures by the maintenance team, further development may become prohibitively expensive. Clones are believed to have a negative impact on evolution. Code clones may adversely affect the software systems quality, especially their maintainability and comprehensibility. For example, cloning increases the probability of update anomalies (inconsistencies in updating). If a bug is found in a code fragment, all of its similar cloned fragments should be detected to fix the bug in question. Moreover, too much cloning increases the system size and often indicates design problems such as missing inheritance or missing procedural abstraction. Although the cost of maintaining clones over a systems lifetime has not been estimated yet, it is at least agreed that the financial impact on maintenance is very high. The costs of changes carried out after delivery is estimated at 40%
Several studies show that about 5% to 20% of a software system can contain duplicated code, which is basically the result of copying existing code fragments and using then by pasting with or without minor modifications. Copying code fragments and then reuse by pasting with or without minor modifications or adaptations are common activities in software development. This type of reuse approach of existing code is called code cloning and the pasted code fragment (with or without modifications) is called a clone of the original. However, in a postdevelopment phase, it is difficult to say which fragment is original and which one is copied and therefore, fragments of code which are exactly the same as or similar to each other are called code clones, i.e., instances of duplicated or similar code fragments are called code clones or just clones. Several studies show that software systems with code clones are more difficult to maintain than the ones without them. The tendency of cloning not only produces code that is difficult to maintain, but may also introduce subtle errors. Code clones are considered as one of the bad smells of a software system and it is widely believed that cloned code has several adverse effects on the maintenance life-cycles of software systems. Therefore, it is beneficial to remove clones and prevent their introduction by constantly monitoring the source code during its evolution. This practice is common, especially in device drivers of operating systems where the algorithms are
- 70% of the total costs during a systems lifetime. Existing research shows that a significant amount of code of a software system is cloned code and this amount may vary depending on the domain and origin of the software system. 2. RELATED WORK
token-for-token identical Token - for - token identical with occasional variation (i.e., insertion/deletion/modification of tokens.
The related work of the existing system is capable of doing simple reuse by copy/paste to their design, functionalities, and logic. Duplication of clones in the existing systems focused mainly on different type clone class identification, situations that lead to several code cloning detection. There are several examples - In Extracting the Similarity in Detected Software Clones Using Metrics[1] used metrics as a technique for finding similar code blocks and for quantifying their similarity. This technique can be used to find clone clusters, sets of code blocks all within a user supplied similarity. It detects similar clones using metrics for type 1 and type 2 clones. This approach is implemented by using a clonedetection tool for C programs based on which the user can find clones with varying degrees of similarity. Similarity scores varies based on the user supplied threshold value and therefore the percentage of clones identified is not accurate. In Finding Similar Defects Using Synonymous Identifier Retrieval [2] used SC-Retriever tool which takes a code fragment containing a defect as the query input, and returns code fragments containing the same or synonymous identifiers which appear in the input fragment. SC-Retrieval tool is not efficient for small code fragments when compared to the CCFinder. Execution time is more for the clustering process. Clones can be introduced by the way a system is developed. Sometimes two software systems of similar functionalities are merged to produce a new one. Although these systems may have been developed by different teams, clones may produce in the merged system because of the implementations of similar functionalities in both systems. Generating code with a tool using generative programming may produce huge code clones because these tools often use the same template to generate the same or similar logic. We tend to retrieve the clones using the clone manager in the below given formats: character-for-character identical character-for-character identical with white space characters and comments being ignored
The following example for simple character-forcharacter identical is given below. Here exact copy of the fragment 1 is found in some part of the program as fragment 2. Example code fragment 1: If ( a<b) { c=d+b; d=d+1; } Else c=d-a; Example code fragment 2: If ( a<b) { c=d+b; d=d+1; } Else c=d-a; Most systems employed various tools and techniques to identify clones. Therefore based upon our clone manager tool it employs more sophisticated manner that permit great changes between exact clones, which basically depend upon Rabin-Karp algorithm for effective retrieval of exact clones . 3. BACKGROUND WORK
In our project we maintain a Microsoft SQL Server Studio Management 2008database that contains all the input files for determining the exact code clones of a particular file. The database will contain (fields) the filename and also the last modification date done to the respective input file. It also helps us to retrieve the exact clones if it is already detected in the program by indicating the start position and the end position lines of the clones that appear in the program. This helps in notifying the exact clones at once when respective input file is viewed. If an input file does not have any clones in the program, clone
manager takes the responsibility to retrieve the exact clone segments and its varying position of occurrence in the program. Our proposal is carried out by Microsoft Visual Studio 2010 Service Pack 1 that retrieves the exact clones through the clone manager which provides a flexible manner to perform the two important phases, high lightening and modification. We also used effective RabinKarp algorithm to search strings in the program for retrieving exact clones based upon their positioning. 4. PROPOSED SYSTEM
The first step is to check if clones are already detected in the input file. The main purpose of this step is to reduce the time used for clone detection as files can be of varying size and detecting clones for the same may take some time. Thus input is checked with database to find if clones are already detected in the input. The database contains the hash value of the files used. If clones are detected in the input then the next step namely clone detection can be skipped and the detected clones are displayed. 4.2 Clone Manager The next step is using clone manager. The main part of the project is the clone manager. The detected clones are handled using the clone manager. The required clones can be selected and edited. When a single clone is edited the changes are reflected in all the clones of the same type. It is easy to modify all clones since changes to any one clone are automatically and simultaneously made to all similar clones, and it is easy to modify a single clone instead by changing editing modes. This process reduces the user s processing time and hence can improve productivity. There are a number of reasons why duplicate code may be created, including:
Code cloning activities are very easy and can significantly reduce programming effort and time as they reuse an existing fragment of code rather than rewriting similar code from scratch. Only identification process does not help in removal of bugs, therefore modification should also be carried out equally. Therefore we propose a novel system that accomplishes the modification activity. This approach includes creating an environment to perform modifications .The programmers must ensure that when a modification is done in one part of the clone gets automatically updated in all the other parts of the clone. This modification activity not only reduces the programmers work but also saves time. Using Clone Manager we retrieve all the type 1 clones upon which the modification work is carried out. There are several modules in our proposed system that explains how our modification process is carried out. Our architecture diagram describes the following modules. 1. 2. 3. 4. CHECKING FILE USING DB CLONE MANAGER HIGHLIGHTING MODIFICATION
Copy and paste programming, in which a section of code is copied "because it works". In most cases this operation involves slight modifications in the cloned code such as renaming variables or inserting/deleting code. Functionality that is very similar to that in another part of a program is required and a developer independently writes code that is very similar to what exists elsewhere. Plagiarism, where code is simply copied without permission or attribution.
Figure 1: Architecture diagram 4.1 Checking File using DB
Sequences of duplicate code are sometimes known as code clones or just clones. Code duplication is generally considered a mark of poor or lazy programming style. Good coding style is generally associated with code reuse. It may be slightly faster to develop by duplicating code, because the developer need not concern himself with how the code is already used or how it may be used in the future. The difficulty is that original development is only a small fraction of a product's life cycle, and with code duplication the maintenance costs are much higher. Some of the specific problems include:
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) Code bulk affects comprehension: Code duplication frequently creates long, repeated sections of code that differ in only a few lines or characters. The length of such routines can make it difficult to quickly understand them. This is in contrast to the "best practice" of code decomposition. Purpose of masking: The repetition of largely identical code sections can conceal how they differ from one another, and therefore, what the specific purpose of each code section is. Often, the only difference is in a parameter value. The best practice in such cases is a reusable subroutine. Update anomalies: Duplicate code contradicts a fundamental principle of database theory that applies here: Avoid redundancy. Non-observance incurs update anomalies, which increase maintenance costs, in that any modification to a redundant piece of code must be made for each duplicate separately. At best, coding and testing time are multiplied by the number of duplications. At worst, some locations may be missed, and for example bugs thought to be fixed may persist in duplicated locations for months or years. The best practice here is a code library. File size: Unless external lossless compression is applied, the file will take up more space on the computer.
every line of a program. It is possible to have an overall idea on the other files containing other similar copies of that fragment and modify in a better effective manner using clone manager. 5. EXPERIMENTAL WORK
We performed our project on a Intel Core2Duo Processor 3.0 GHz and 2 GB RAM. . Modification activities are very easy and can significantly reduce programming effort and time as they reuse an existing fragment of code rather than rewriting similar code from scratch. Therefore we carry out our experimental procedure in the following manner. 1. We first maintain a Microsoft SQL Server Studio Management (2008) database for retrieving the exact clone from a particular input file that is being viewed. The database will contain the filename, the clones that are detected in the respective file by start and end position of the clones in the source code. 2. If clones are not detected for a particular file found in the database then they are said to be detected and retrieved using Clone manager. 3. This Clone manager helps in retrieving the exact clones of a program using effective Rabin-Karp algorithm that are highlighted in the source program. Therefore necessary modification can be done to the highlighted clones. 4. Modification carried out to the retrieved clones will automatically be updated to the clones in the source program .Thus it reduce programmers effort and time significantly. 5.1 Qualitative modification analysis: The modification process is carried effectively to the retrieved clones by the Clone manager. The clone manager detects the exact clones in the source file eventually then does the required modification to the clones, thereby allowing automatic and simultaneous modification to the clones of the source file. The following is an example of how modification is carried out in the using source file using clone manager. 1 2 3 static void setStringOption( String value, String option, List dest ) { if (value != null && !value.trim().equals("")) { // NOI18N List subList = new ArrayList();// Exact clones in the source code
4.3 Highlighting A number of different algorithms have been proposed to detect duplicate code. In our project we use Rabin Karp string search algorithm. The RabinKarp algorithm is a string searching algorithm created by Michael O. Rabin and Richard M. Karp in 1987 that uses hashing to find any one of a set of pattern strings in a text. The clones are detected and displayed highlighted along with their line position. 4.4 Modification The final stage of my project is carrying out modification to the clones that are highlighted. The modification done to the retrieved clones will lead to automatic updation on the rest of the clone segments .Thus clone manager helps in automatic modification and reduce programmer effort by modifying each and
4 5 6 7 8
subList.add( option ); // Exact clones in the source code subList.add( value ); } }//System.out.println ( option + " " + value ); // NOI18N List subList = new ArrayList(); // Exact clones in the source code subList.add( option ); // Exact clones in the source code (A). Original source code as input
that are highlighted in the program and creates a better environment for performing automatic modification which reduce time significantly. 6. CONCLUSION
1 static void setStringOption( String value, String option, List dest ) { 2 if (value != null && !value.trim().equals("")) { // NOI18N 3 List subList = new ArrayList(); 4 subList.add( option ); 5 dest.Addall(subList);// Modification to the exact clones 6 subList.add( value ); 7 } }//System.out.println ( option + " " + value ); // NOI18N 8 List subList = new ArrayList(); 9 subList.add( option ); 10 dest.Addall(subList); // Modification to the exact clones (B). Modification done to the source code Figure 2: Clone manager used for automatic modification in the input file The original input file that contains clones which are later being modified by adding an extra line to the clone. Therefore wherever the clones appear in the program will carry out the modification done by the clone manager. Initially the clones are reported after the use of clone manager so to perform modification by indicating the clones that appear in the program. LINE COUNT 2 2 START LINE 3 7 Table 1: Clone Detection Report Therefore using Clone manager we retrieve clones and modification is performed on the exact clones END LINE 4 8
Thus we have implemented our project by retrieving the similar clones using clone manager that does two functions. In the highlighting phase, after retrieving the exact clones, they are highlighted in the source code by which it separates the type1 clones from the other code. In the modification phase, a change in the code fragment will lead to automatic updation in all the highlighted area. Therefore automatic modification to the exact clones reduces the programmers effort and complexity.
7.
REFERENCES
[1] Perumal.A, Kanmani.S, Kodhai.E Extracting the Similarity in Detected Software Clones Using Metrics Intl Conf. on Computer & Communication Technology |ICCCT10|. [2] Norihiro Yoshida, Takeshi Hattori, Katsuro Inoue Finding Similar Defects Using Synonymous Identifier Retrieval ITWS10, May 8, 2010, Cape Town, South Africa. [3] Chao Liu, Chen Chen, Jiawei Han, Philip S.Yu GPLAG: Detection of Software Plagiarism by Program Dependence Graph Analysis KDD06, August 20-23, 2006, Philadelphia, Pennsylvania, USA. [4] Mark Gabel, Lingxiao Jiang, Zhendong Su, Scalable Detection of Semantic Clones ICSE08, May 10-18, 2008, Leipzig, Germany. [5] Chanchal Kumar Roy and James R. Cordy, A Survey on Software Clone Detection Research September 26, 2007. [6] Brenda S. Baker. Finding Clones with Dup: Analysis of an Experiment. IEEE Transactions on Software Engineering, Vol. 33(9): 608-621, September 2007.
A NEURO-FUZZY APPROACH TO PREDICT ONLINE AUCTION FINAL BIDS

1
A. Naveenraj1, Utham R, Sathish D, Bharathi Vengateshwaran K 2 Asst Professor, Sri Manakula Vinayagar Engineering College, Madagadipet, Puducherry
2
PG Student , Sri Manakula Vinayagar Engineering College, Madagadipet, Puducherry
Abstract This paper focuses on predicting the final price or the final bid in an online auction using a combination of artificial neural networks and fuzzy logic. The varying nature of the human mind makes the patterns in the bidding process non linear and hence linear prediction systems are not adequate to predict the final bid in online auctions. Neural networks on the other hand handles non linear data well and therefore can be used in such circumstances. Fuzzy logic is used to bring in other variables in an auction that cannot be expressed in numerical terms otherwise. These factors have a profound effect on the auction bids and hence the use of Fuzzy logic to represent these factors. 1. Introduction: 1.1 Online Auctions Auctions on the internet are of different types. The most common of them are the English auction, the Dutch auction, the first price sealed auction and the Vickrey auction. They have varying strategies and are quite different from each other. The English Auction: In the English auction the auctioneer sets a reserve price for an item and from on there is bid with a value that is almost equal to the real value of the object that is being auctioned. The Dutch Auction: The Dutch Auction is a descending price auction. The auctioneer begins with a high asking price and lowers it until one of the bidders is ready to accept the price or till it reaches a predetermined reserve price. In this method there is only one bid. 1.2 Neural Networks An artificial neural network is inspired from the biological neural network. It consists of artificial an open auction with bidders naming prices in neurons interconnected together.
ascending order. In other words a bidder can only make a bid if it is greater than the last bid. The English auction is the most popular type of auction on the internet. The First Price-Sealed Auction: [In the First Price-Sealed Auction the auctioneer sets a reserve price and the bids are received in a way that they are not known to the other bidders. The bidder with the highest bid wins the auction. The Vickrey Auction: The Vickrey Auction is similar to the first pricesealed auction. The difference is that the highest bidder need only pay the value of the second highest bid in the auction. This strategy induces the bidder to Figure 1:Structure of a neural network Usually Artificial Neural networks are adaptive systems that can change their structure based on internal or external information available to them during the learning phase. This aspect of neural networks make them difficult to follow as the process of restructuring that goes on inside the Artificial Neural Network is complex and hidden. Learning in Neural networks has attracted much
attention due to the fact that this ability makes it possible for Artificial Neural Networks to have artificial intelligence. Given a specific problem to solve with a set of inputs and outputs the Artificial Neural Networks can restructure itself in such a way that it can provide similar outputs to similar inputs. 1.3 Fuzzy Logic Fuzzy logic is a form of many valued logic. It began with the proposal of fuzzy set theory by Lofti Zadeh. Fuzzy sets contain elements which have different degrees of membership. Fuzzy logic differs from traditional two valued logic in the fact that Fuzzy logic variables have a truth value that ranges in degree between 0 and 1 in contrast to traditional logic which uses 0 or 1 and nothing else. Fuzzy logic is useful when linguistic terms are to be converted to numerical terms. Varying degrees of truthfulness can be given to a variable. 2. Related Works Forecasting Winning Bid Prices in an Online Auction Market - Data Mining Approaches[4] Suggests in building a forecasting model which makes use of the data mining techniques that provides better performance than statistical analysis solving the problem of asymmetry on online auction. Forecasting Financial Stocks using Data Mining[3] Presents an approach to forecast daily changes in seven financial stocks prices comparing the performance of ordinary least squares and neural network models to predict the changes in the stock prices and to increase the accuracy of forecasting using a financial data mining technique to assess the feasibility of financial forecasting. Predicting Australian Stock Market Index Using Neural Networks Exploiting Dynamical Swings and Intermarket Influences[5] An approach for predicting the Australian stock market index using multi-layer feed-forward neural networks discovered for the prediction purpose which develops inter-market influences from professional technical analysis and quantitative analysis is presented. A Final Price Prediction Model for English Auctions - A Neuro-Fuzzy Approach[1]
Three models regression, neural networks and neuro-fuzzy are constructed for the prediction of the final prices of English auctions using real-world online auction, to avoid overpricing by which, testing of theory building is possible using the Knowledge Base obtained from neuro fuzzy approach. Neural Network Fluctuations[8] Predictions of Stock Price
Focuses on predictions of certain business day periods through the use of input patterns based on fundamental and technical indicators from the stock market and economy and the outcomes are statistically significant based on fundamental inputs of the business day periods. Predicting Commodity Prices Using Artificial Neural Networks [9] Artificial neural networks have been used to predict the prices of various commodities in the virtual economy of World of Warcraft. Predicting the End-Price of Online Auctions[10] Describes the usage of machine learning algorithms to predict end-prices of auction items with the help of features and several formulations of the price prediction problem illustrating the accuracy of the algorithms used in online marketplaces. An Empirical Evaluation on the Relationship between Final Auction Price and Shilling Activity in Online Auctions[11] Observing the lower-than-expected final auction price and the higher-than-expected final auction price of the auction data using a neural network approach to allow more focused evaluation of those shill-suspected auctions. Stock Prediction A Neural Network Approach[12] A neural network on error correction is defined and implemented with technical and fundamental data as input to the network with stocks of two exchanges are predicted using two separate network structures. Daily and weekly predictions are performed by benchmark comparisons proving the index prediction successful performing the naive approach.
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) 3. Proposed work The model that has been developed takes in bids from an ongoing auction as the input. The time when the bids were made are also given as inputs. In other words the bids with their corresponding time are given as input to the neural network. The neural network is in training mode as it learns the relationship with the time and corresponding bids. 4. Results The results in the table have been rounded off to the nearest multiple of five. Bid value 29.75 50 100 126 115.75 145.75 152 200 172 177 197.5 208.79 205 240 215 225 230 235 240 251.86 245 250 256.86 Neural Network Prediction Neuro-Fuzzy prediction Bid Time 1.4036 1.44493 1.5117 1.73032 1.83111 1.83222 1.8815 2.02583 2.04309 2.04764 2.08123 2.08149 2.15163 2.22221 2.55299 2.5537 2.55424 2.55559 2.55612 2.5767 2.69587 2.69597 2.69617 265 260
Figure 2:Architecture of proposed system As auctions are also affected by other factors other than time these factors should also be taken into consideration. These factors are expressed linguistically and therefore cannot be input to neural networks as such. Hence it is necessary to make use of Fuzzy logic to express these factors in numerical terms.
Bid value Figure3: Screenshot of the system The factors that are input to the Fuzzy logic include the duration of the auction, the initial price or the reserve price, the market price of the object being auctioned, the perceived value of the object in case it is an antique or an object of historical importance. These factors are perceived with linguistic terms and are converted to numerical terms with predetermined numbers and the output is then fed into the knowledge base of the neural network while it is learning. 200 240 205 215 220 240 250 250
Bid Time 0.06221 2.13655 2.20715 2.20748 2.20794 2.50443 2.50479 2.98997
255 Neural Prediction
2.99023 Network 270
Neuro-Fuzzy prediction 260 Bid value Bid time 150 0.28953 30 0.50344 50 1.20469 60 1.20501 69 1.20525 100 1.81757 110 1.90979 115 1.90998 150 1.91111 152.5 1.91128 200 2.09784 230.07 2.25087 205 2.27106 210 2.2712 215 2.68098 220 2.69066 248.5 2.83444 Neural Networks prediction 235 Neuro-Fuzzy prediction 240 Bid value 100 175 180 200 205 230 220 240.07 242.01 245 250.04 250 255.04 260.04 Bid time 0.65369 1.94803 1.94833 2.23096 2.23112 2.63727 2.66809 2.82103 2.89637 2.91896 2.98309 2.98461 2.98484 2.99863
Neural Network Prediction 270 Neuro-Fuzzy Prediction 270 5. Conclusion The paper aims to predict the final bid of an online auction using the previous bids and other factors that affect an auction. The Neuro Fuzzy approach used here is well suited for auctions but they can also be used to predict similar environments like the stock market. The test results show that the neural network works well in predicting the final bid and the Fuzzy logic input makes the prediction closer to the actual bid thus justifying its use here. Future work can be done on adjusting the system in order for it to work for other type of auctions as now it can work only on the English auction. Some factors that affect auctions that are not considered here can also be incorporated in the future. 6. References [1] Chin-Shien Lin, Shihyu Chou, Shih-Min Weng, Yu-Chen Hsieh A Final Price Prediction Model for English Auctions - A Neuro-Fuzzy Approach, Springer Science Business Media, June 2011. [2] Nandini S. Sidnal, Sunilkumar S. Manvi BDI Agent Based Final Price Prediction for English Auctions in Mobile E-Commerce, World Review of Business Research September 2011. [3] Luna C. Tjung, Ojoung Kwon, K. C. Tseng and Jill Bradley-Geist Forecasting Financial Stocks using Data Mining, Global Economy and Finance Journal, September 2010. [4] KIM Hongil, BAEK Seung Forecasting Winning Bid Prices in an Online Auction Market Data Mining Approaches, Journal of Electronic Science and Technology of China, September 2004. [5] Heping Pan, Chandima Tilakaratne, John Yearwood Predicting Australian Stock Market Index Using Neural Networks Exploiting Dynamical Swings and Intermarket Influences, Journal of Research and Practice in Information Technology, February 2005.
[6] Ramon Lawrence Using Neural Networks to Forecast Stock Market Prices [7] Buckley, J.J. , Hayashi, Y. Neural networks for fuzzy systems, Fuzzy Sets and Systems, 1995, [8] Wojciech Gryc Neural Network Predictions of Stock Price Fluctuations [9] Andy Korth Predicting Commodity Prices Using Artificial Neural Networks, 2005. [10] Rayid Ghani, Hillery Simmons Predicting the End-Price of Online Auctions
[11] Fei Dong, Sol M. Shatz, Haiping Xu An Empirical Evaluation on the Relationship between Final Auction Price and Shilling Activity in Online Auctions, Jan 2011. [12] Karl Nygren Stock Prediction A Neural Network Approach, March 2004. [13] Bajari, P. and A. Hortacsu, Winners Curse, Reserve Prices, and Endogenous Entry Empirical Insights from Ebay Auctions, The Rand Journal of Economics, 2002.
LICENSE PLATE IDENTIFICATION FOR INTELLIGENT TRANSPORTATION

1,2
Dinesh.R.1, Shanthi.A.P2 Department of Computer Science & Engineering, Anna University, Chennai 600025
Abstract License Plate Recognition System (LPRS) is one of the most important parts of the Intelligent Transportation System (ITS). The main objective is to improve overall system accuracy. To achieve this Field of Experts concept is combined with improved Bernsen algorithm to pre process the licence plate efficiently. Then the plate is segmented horizontally or vertically. Character recognition is done by selecting the salient features. This system is implemented using MATLAB. The proposed system mainly applicable to Indian License Plates.
I. INTRODUCTION
Licence Plate Recognition (LPR) is an imageprocessing technology used to identify vehicles by their license plates.LPR is a challenging task because of diversity in plate formats, outdoor illumination. Even the background, the fonts used in number plates, speed of the vehicle and the expanse between the camera and the vehicle also plays a vital role in LPR system. Beyond its complexity, indeed its needed in different defence application such as unattended parking lots, traffic safety enforcement, for security purpose in restricted areas. Previous work approaches has certain constraints such as stationary background, constant illumination for better results in LPR system. But in practical situations these conditions are not possible to help out the system in indentifying the number plates. LPR system mainly consists of four major tasks. They are image acquisition, license plate localization and segmentation, character segmentation and standardization, and character recognition. Image acquisition is done through the camera or the infra red rays near the check post. There are two major issues in locating the number plate in the image. Firstly is the colour of the license plate and the second issue is the colour of the car. Whenever there is a problem with histogram distribution or illumination, the xed threshold is not usable. These major problems sometimes cause the license plate to be undetected and it may turn out to be a failure. Apart from that, character segmentation for localization is another point to be highlighted. Some characters may appear to be attached to one another and this may lead to the process of segmentation to
be disqualied. Whenever the threshold selection process encounters a problem, the character-labelling process, which is a sub-process of the segmentation process, would also be affected. The performance of the locating operation is crucial for the entire system, because it directly influences the accuracy and efficiency of the subsequent steps. However, it is also a difficult obstacle to overcome because of different illumination conditions and various complex backgrounds.
II. RELATED WORK

There are numerous methods have been proposed for each task of licence plate recognition. Researchers have proposed many methods of locating the license plates, such as the edge detection method line sensitive filters to extract the plate areas, the window method and the mathematics morphology method. These algorithms can locate the license plate but they possess formidable disadvantages such as sensitivity to brightness, longer processing time, and lack of versatility in adapting to the varying environment. License-plate candidates are determined based on the features of license plates. Features that are commonly employed have been derived from the license-plate formats and the characters of the license plate. The features of license plates include shape, symmetry, height-to-width ratio, colour, texture, and spatial frequency. Character features include lines, blobs, aspect ratio of characters, distribution of intervals between characters, and alignment of characters. For the past few years Projection, morphology, relaxation labelling, and connected components these are the techniques which had been used for character
segmentation. There have been a large number of character recognition techniques including Bayes classifier, genetic algorithms, artificial neural networks, fuzzy C-means, support vector machines, Markov process and K nearest neighbour classification. These methods mainly depend on iterative approaches. Iterative methods achieve good accuracy, but it is at the cost of increased computation complexity. To reduce the computational cost while maintaining a certain degree of recognition accuracy a new character recognition technique have been proposed, which is based on the feature-salience theory. Target recognition needs to extract the features of the target. However, if all the features are used, many questions will appear, such as difficult operation, slow speed, and imprecise result. The task of feature selecting is how to find a set of salient features that are more effective in recognizing targets. The salient feature means what are advantageous to classifying objects. In mathematical expression, the salience feature represents the minimum probability of error in distinguishing a falsely object from the background. Therefore, the most salient feature corresponds to the largest probability in the extracting procedure. According to the prior information trained, in the actual recognition procedure, the system only extracts several salient features. Most of the approaches concentrate only on recognizing two kinds of characters such as English and numeric characters, and they process single line character segmentation only. More complex plate construction methods and more types of character recognition were not discussed. In this paper, our work focuses on a solution for image disturbance resulting from uneven illumination and various outdoor conditions such as shadow and exposure, which are generally difficult for obtaining successful processed results using traditional binary methods. A novel contributions of this paper are given as follows: 1) a novel binary method, i.e., the shadow removal method, which is based on an improved Bernsen algorithm combined with Field of Experts (FoE) concept; and 2) A character recognition algorithm, which uses salient feature selection to reduce the computational cost. Here the character features are extracted from the elastic mesh and the entire address character string is taken as the object of study, as opposed to a single character
The rest of this paper is organized as follows: In Section III the proposed framework and algorithm description is presented. Experimental results are discussed in Section IV, and finally the work is concluded in Section V. III.PROPOSED WORK In an attempt to improve the accuracy of license plate recognition system, Bernsen algorithm is combined with Gaussian filter is used as shadow removal technique and field of experts is used for image. Salient features are extracted and used for character recognition technique. Fig 3.1 describes the pictorial representation of proposed framework of license plate recognition. The typical steps in LPR system are image acquisition, Licence plate localization and segmentation, character segmentation and standardization, and finally character recognition.
Fig 3.1 Proposed Framework for License Plate Recognition System a. Image Acquisition The preliminary step in LPR system is to capture the vehicle image using standard camera. The images are captured in RGB format so it can be further process for the number plate extraction. License plate pre processing is a necessary step in LPR, which includes plate detection, correction, and segmentation. The goal of detection is to locate regions of interest that are similar to the license plate.
Due to the angle of orientation, the image may have a slant and distortion; thus, transformation or correction of image is an important step before the character segmentation. b. Licence Plate Localization Using the Field of Experts concept image denoising has been done. The core contribution is to extend Markov random fields beyond frame by modelling the local field potentials with learned filters. In contrast to example-based approaches, we develop a parametric representation that uses examples for training, but does not rely on examples as part of the representation. Such a parametric model has advantages over example-based methods in that it generalizes better beyond the training data and allows for the use of more elegant optimization methods. After removing the noise from the captured image improved Bernsen algorithm is applied for removing the outdoor illumination and the image is tilted to get better angle for detecting the characters. Step 1: Compute the threshold T1(x, y) of f(x, y) based on following formula.
b(x, y) = 0, if f(x, y) < ((1 )T1(x, y) +T2(x, y)) ; (0, 1) 255, else where is a parameter to adjust the balance between the Bernsen algorithm with the Gaussian filter and the traditional Bernsen algorithm ( [0, 1]). When is equal to 0, the proposed algorithm is the Bernsen algorithm. When is equal to 1, the proposed algorithm is the Bernsen algorithm with a Gaussian filter. By adopting an appropriate , the shadow can be effectively removed, and the characters can be successfully identified. Step 5: Apply median filter to remove the noise. c. Character Segmentation Real license plate images are prone to slant and distortion due to different angles of orientation. Therefore, horizontal and vertical correction and image enhancement are required prior to character segmentation. When a plate was located, we already knew that the plate either had white characters/black background or black characters/white background. Before the character segmentation, the plate was transformed to black characters/white background. Then, all of the plates were resized to 100 200. Afterward, the tilt correction and image enhancement were implemented. Next, we used the projective technique to segment the license plate into two blocks, and some characters were extracted from each block. Finally, the characters were resized to a uniform size. d. Character Recognition The presented positions of English characters are the same on a license plate. English characters were normalized to a 3232 size. The same feature extraction methods to recognize English characters. For numeral feature extraction, the image density feature was employed. First, all numeric characters were normalized to a 32 32 size and then divided into 4 4 blocks. The density feature was calculated in each block, and the dimension of the feature was 16.We also recorded the number of the numeric pixels along the divided line. IV.EXPERIMENTAL RESULTS After removing the noise from the image binary conversion is happened using the equation 4. Fig 4.1 shows the binarization of image. Edge detection of
Step 2: Create the Gaussian filter for the window s = (2w + 1) (2w + 1) of f(x, y), i.e.
(2) Step 3: Compute the threshold T2(x, y) of f(x, y) as
(3) Step 4: Obtain a binary image by
the licence plate and recognition of the character are explained in Fig 4.2 and Fig 4.3
analyzing the Location(L),Segmentation(S), Recognition value for different sample set. Fig 4.4 refers the LSR value for different sample. Images are captured under different criteria. System Accuracy- Fig 4.4 Performance of LSR w.r.to Different Sample Sets the total system accuracy is calculated using the formula A= [L* S* R] % where L, S, R are the percentage of successful plate location, segmentations, and recognitions (successful recognition of all characters on the plate), respectively. Fig 4.5 represents the total system accuracy of the LPR system.
Fig 4.1 Image Binarization Table4.1 LSR Value for Different Samples Sample I II III IV V VI VII L 98.87 94.93 98.82 97.14 96.55 91.38 98.21 S 97.43 98.84 97.67 97.06 98.21 90.41 96.36 R 98.54 100 95.89 90.41 96.36 96.55 97.58
Fig 4.2 Detection of Edges Localisation
Fig 4.3 Character Recognition Performance Analysis The performance evaluation of any system is the measurement of the efficiency for the system. The Accuracy of the system is measured by
Fig 4.4 Performance of LSR w.r.to different Sample Sets
100 90 80 70 60 50 40 30 20 10 0
Ac cur acy (%)
[5] X. Zhang and S. Wang, Fragile watermarking with error-free restoration capability, IEEE Trans. Multimedia, vol. 10, no. 8, pp. 14901499, Aug. 2008. [6] X. Zhu, A. Ho, and P. Marziliano, A new semifragile image watermarking with robust tampering restoration using irregular sampling, Signal Process. Image Communication., vol. 22, no. 5, pp. 515528, 2007. [7] S.-H. Liu, H.-X. Yao, W. Gao, and Y.-L. Liu, An image fragile watermark scheme based on chaotic image pattern and pixel-pairs, Appl.Math. Comput., vol. 185, no. 2, pp. 869882, 2007. [8] X. Zhang and S. Wang, Statistical fragile watermarking capable of locating individual tampered pixels, IEEE Signal Process. Lett., vol.14, no. 10, pp. 727730, Oct. 2007. [9] M. Chen, Y. Zheng, and M. Wu, Classificationbased spatial error concealment for visual communications, EURASIP J. Appl. Signal Process., 2006. [10] H. He, J. Zhang, and H.-M. Tai, Awaveletbased fragilewatermarking scheme for secure image authentication, in Proc. 5th Int. Workshop Dig. Watermarking, 2006, vol. 4283, pp. 422432.
Sample Sets
Fig 4.5 Accuracy of LPR System V.CONCLUSION In this paper a novel framework for licence plate recognition system is proposed. The captured image is denoised using field of experts concept, and then the image is pre processed using improved Bernsen shadow removal algorithm. In character segmentation appropriate angle correction have been made finally using salient feature selection the character in the number plate are recognized. Overall system accuracy is 98.25%. REFERENCES [1] X. Zhang, S. Wang, Z. Qian, and G. Feng, Reversible fragile watermarking for locating tampered blocks in JPEG images, Signal Process., vol. 90, no. 12, pp. 30263036, 2010. [2] X. Zhang and S. Wang, Fragile watermarking scheme using a hierarchical mechanism, Signal Process., vol. 89, no. 4, pp. 675679, 2009. [3] H. He, J. Zhang, and F. Chen, Adjacent-block based statistical detection method for self-embedding watermarking techniques, Signal Process., vol. 89, pp. 15571566, 2009. [4] A. Yilmaz and A. A. Alan, Error detection and concealment for video transmission using information hiding, Signal Process. Image Communication., vol. 23, no. 4, pp. 298312, 2008.
Intensified reputation based security algorithm using fuzzy-logic (IRSF)

R. Sudha, Asst.Professor1, Dr. D. Sivakumar, Dean (P&E)2,
1
Dr.Pauls Engineering College, Villupuram,2Arunai College of Engineering. Thiruvannamalai.
.Abstract Mobile Ad Hoc Network (MANET) is collection of multi-hop wireless mobile nodes that communicate with each other without centralized control or established infrastructure. Mobile ad hoc networks (MANET) composed of mobile nodes, pose a constant challenge to provide reliable and high quality routing algorithm among these devices and have a lot of critical issues, such as the security of data transmission. In this paper, we propose a Intensified reputation based security algorithm using fuzzy-logic (IRSF) for finding a reliable path in Mobile Ad Hoc Networks. The misbehavior of nodes which are due to selfish or malicious may lead to the packet loss, rejecting of services, etc. In this paper, a new scheme based on trust value, bandwidth and hop count is proposed to solve these problems. This scheme employs the fuzzy logic to perform the routing path decision in order to choose the best and feasible routing path. Finally, a simulation tool, is utilized to estimate the efficiency of the proposed scheme. Simulation results show that IRSF has significant reliability improvement in comparison with ARAN(Authenticated routing protocol for mobile Ad hoc networks).
I. INTRODUCTION A mobile ad hoc network is an independent group of mobile users which communicate over unstable wireless links. Because of mobility of nodes ,the network topology may change rapidly and unpredictably over time. All network activity , including delivering messages and discovering the topology must be executed by the nodes themselves. Therefore routing functionality, the act of moving information from source to a destination, will have to be incorporated into the mobile nodes .Hence routing is one of the most important issue in MANET. Routing protocols in MANETs are generally classified as proactive and reactive [1]. Reactive routing protocols [2,3,4,5,6,7],which also called on demand routing protocols, start to establish routes when required. These kind of protocols are based on broadcasting RREQ and RREP messages. The duty of RREQ message is to discover a route from source to destination node .When the destination node gets a RREQ message, it sends RREP message along the established path. On demand protocols minimize the whole number of hops of the selected path and also they are usually very good on single rate networks. There are many reactive routing protocols, such as ad hoc on-demand distance vector (AODV) [6],dynamic source routing (DSR) [4], temporally order routing algorithm (TORA)[5], associativity-based routing (ABR) [7], signal stability-based adaptive (SSA) [3],
and relative distance microdiscovery ad hoc routing (RDMAR) [2]. In contrast , in table-driven or proactive routing protocols [8,9,10,11,12], each node maintains one or more routing information table of all the participating nodes and updates their routing information frequently to maintain latest view of the network. In proactive routing protocols when there is no actual routing request, control messages transmit to all the nodes to update their routing information. Hence proactive routing protocols bandwidth become deficient. The major disadvantage of pro-active protocols is the heavy load caused from the need to broadcast control messages in the network [3]. There are many proactive routing protocols, such as destination sequenced distance vector (DSDV) [12], wireless routing protocol (WRP) [9], clusterhead gateway switch routing (CGSR) [10], fisheye state routing (FSR) [11], and optimized link state routing (OLSR) [8]. Many of the work reported on routing protocols have focused only on shortest path, power aware and minimum cost.However much less attention has been paid in making the routing protocol to choose a more reliable route. In critical environment like military operation, data packets are forwarded to destination through reliable intermediate nodes[13].In this paper, we propose a reliable routing algorithm based on fuzzy logic. In this scheme for each node we determine two parameters , trust value and energy value, to calculate the lifetime of routes . During route
discovery, every node inserts its trust value ,energy value, path reputation, hop count in RREQ packet .In the destination , based on a new single parameter which is called reliability value , is decided which route is selected. The route with higher reliability value is candidated to route data packets from source to destination. The rest of the paper is organized as follows: In Section 2, we briefly describe the related work. Section 3 describes our proposed routing algorithm and its performance is evaluated in Section 4.Finally,Section 5 concludes the paper. II. THE ROUTING PROTOCOLS The most common Routing protocol AODV handles the dynamic and rapidly changing Adhoc Network very efficiently but not securely .From the view point of security every protocol must satisfy the following criteria[3] Certain Discovery, Isolation, Light weight Computation, Byzantine Robustness. There are certain exploits which are allowed by the existing protocols like AODV & DSR are Attacks using Modification[1],which includes Redirection by modified route sequence numbers, Redirection with modified hop counts, Denial of Service with modified source routes & Tunneling .Attacks using Fabrication includes falsifying Route Errors, Route Cache poisoning. There are two types of Adhoc Network nodes: a. Malacious Nodes-These are the nodes that suppress the correct function of routing protocol by modifying routing information, fabrication false information. It is the node that aims at damaging other nodes by causing network outage by partitioning while saving battery life is not a priority[4].Malicious network nodes that participate in routing protocols but refuse to forward protocols but refuse to forward messages may corrupt a MANET. B.Selfish Nodes-These nodes severely degrade network performance and eventually partition the network by simply not participating in the network operation. [14]. ARAN Security Analysis:-ARAN[1] makes use of cryptographic certificates to offer routing security and to accomplish its task with authencity .its main feature is to find & protect from misbehaving nodes from third party .For using Aran one has to pay less performance cost to achieve high security . Solution to vulnerabilities by ARAN: 1. Unauthorized Participation: Without authorization from trusted certificate server, no node can work, so there is no chance of unauthorized participation. 2. Attacks against Fabrication: ARAN ensures No Repudiation & prevents spoofing & Un
authorization participation in routing. 3. Attacks against Impersonation: Route Discovery Packets (RDP) [3] contains the difference of source node and is signed with sources private key. Similarly, Reply Packets (RREP) includes Destination Nodes Certificate& Signature which ensures that destination can respond to Route Discovery. This prevents Impersonation Attacks where either the source or destination node is Spoofed [2]. 4. No Alternation of Routing Messages: As we know that all fields of RDP & RREP packets are specified in ARAN & they remain unchanged between Source and Destination [2].Hence Modification Attacks can be prevented. 5. Attacks Using Modification of Protocol Message: The initiating node signs both packet types, any alternation would be detected & the altered packet would be thrown out. 6. Denail of Service Attacks: can be conducted by nodes with or without valid ARAN Certificates: A. in Certificate less case: All possible Attacks are limited to the attackers immediate neighbors because unsigned route requests are dropped. B.In Certificate case: Nodes with valid certificated can conduct effective Denial of Service attacks by sending unnecessary route requests & they will go undetected as the current existing RAN protocol cannot differentiate between legitimate & malicious RREQs coming from authenticated nodes [3]. III. RELATED WORKS We can classify all the works that have been done in reliable routing, in three categories: GPS-aided protocols ,energy aware routing ,and trust evaluation methods .In this section, we will overview some proposed protocols that have been given to designing reliable routing protocols. A reliable path has more stability than a command path. Some of reliable routing protocols propose a GPS-aided process and use route expiration time to select a reliable path. In [14] Nen-chung Wang et al, propose a stable weightbased on-demand routing protocol (SWORP) for MANETs. The proposed scheme uses the weightbased route strategy to select a stable route in order to enhance system performance .The weigth of a route is decided by three factors:the route expiration time, the error count , and the hop count . Route discovery usually first finds multiple routes from the source node to the destination node. Then the path with the largest weigth value for routing is selected . In [15], NenChung Wang and Shou-Wen Chang also propose a
reliable on-demand routing protocol (RORP) with mobility prediction. In this scheme, the duration of time between two connected mobile nodes is determined by using the global positioning system (GPS) and a request region between the source node and the destination node is discovered for reducing routing overhead. the routing path with the longest duration of time for transmission is selected to increase route reliability. In [16], Neng-Chung Wang etal, propose a reliable multi-path QoS routing (RMQR) protocol for MANETs by constructing multiple QoS paths from a source node to a destination node. The proposed protocol is an ondemand QoS aware routing scheme. They examine the QoS routing problem associated with searching for a reliable multipath (or uni-path) QoS route from a source node to a destination node in a MANET. This route must also satisfy certain bandwidth requirements. They determine the route expiration time (RET) between two connected mobile nodes by using global positioning system (GPS). Then use two parameters, the route expiration time and the number of hops, to select a routing path with low latency and high stability. some other proposed protocols are considering energy and trust evaluation as a factor of reliability . In [17], an approach has been proposed in which the intermediate nodes calculate cost based on battery capacity. The intermediate node take into consideration whether they can forward RREQ packet or not . This protocol improves packet delivery ratio and throughput and reduces nodes energy consumption[13].In [18], Gupta Nishant and Das Samir had proposed a method to make the protocols energy aware .They were using a new function of the remaining battery level in each node on a route and number of neighbours of the node. This protocol gives significant benefits at high traffic but at low mobility scenarios[13].In [19], a novel method has been discussed for maximizing the life span of MANET by integrating load balancing and transmission power control approach. The simulation results of this mechanism showed that the average required transmission energy per packet was reduced in comparison with the standard AODV. In [20] Pushpalatha & Revathy have proposed a trust model in DSR protocol that categorize trust value as friend, acquaintance and stranger based on the number of packets transferred successfully by each node[13].The most trusted path was determined from source to destination.Results indicated that the proposal had a
minimum packet loss when compared to the conventional DSR.Huafeng Wu & Chaojian Shi1 [21] has proposed the trust management model to get the trust rating in peer to peer systems, and aggregation mechanism is used to indirectly combine and obtain other nodes trust rating[13]. The result shows that the trust management model can quickly detect the misbehaviour nodes and limit the impacts of them in a peer to peer file sharing system[13].all above papers used the separate parameters such as battery power ,trust of a node or route expiration time individually as a factor for measuring reliability of route. In this paper, we consider both energy capacity and trust of nodes for route discovery . IV. SELFISH NODE WEAKNESS OF ARAN An individual mobile node may attempt to benefit from other nodes but denies to share its own resources .These are known as Selfish Nodes and this behavior is termed as Selfishness. This un cooperative behavior can lead to breakdown of whole communication network. ARAN is capable of defending itself against Spoofing, Fabrication, Modification, and DOS Attacks. The currently existing ARAN secure protocol does not account for attacks conducted by Authenticated Selfish Nodes as these nodes trust each other to co operate in providing network functionalities. So ARAN is not capable to detect & defend against selfish node. If un authenticated Selfish node does not forward or intentionally drop control or data packets ,the current specification of ARAN cannot detect Selfish nodes .This weakness of ARAN can cause disturbance in MANETS & leads to wastage of network bandwidth. Techniques to detect Selfish Nodes: Various techniques have been proposed to detect and prevent Selfish Nodes in Manets. Nodes may exhibit non, cooperation by refusing to route packets due to several reasons such as power and other resource constraints or intent to deliberately disrupt the system. There are various approaches for stimulating co-operation .These approaches are mostly a. Incentive based /Credit based /Virtual Currency Based Schemes B.Punishment based /Reputation Based Schemes . Incentive based schemes are normally implemented using credits that are given to nodes that co-operate & forward packets. The basic problem with these schemes is they either depend on use of temper proof hardware to monitor the increase or decrease of virtual currency or require a central
server to determine the change and credit to each node involved in the transmission of a message. However these approaches suffer from location privilege problem [8]. Punishment based schemes identify & punish nodes that exhibit non co-operative behavior. These schemes define a metric called Reputation, in which is the goodness of a Node, as perceived by the neighbors & the reputation is decreased on evidence of non co-operation, so these are called Reputation based Schema. These Schemes are based on observation & tests .Nodes which is detected doing misbehavior is informed to other nodes in order to exclude the suspicious node from the Network. The main function of Reputation Based Schemes is Monitoring, Reputation and Response. Based on these functions the reputation based scheme aims at detecting selfish behavior on packet forwarding when it appears in the network. In, Marti et al, proposed a scheme that contains two major modules, termed as Watchdog and Path rater to detect & mitigate respectively. Due to its reliance on overhearing, however the Watchdog technique may fail to detect misbehavior or raise false alarms in the presence of ambiguous collisions & limited transmission power. The CONFIDANT protocol proposed by Buchegger & le Boudec on is based on selective altruism & utilitarianism, this making misbehavior unattractive. It has four componentsMonitor, The Reputation System, The Path Manager & the Trust Manager. The monitor component of CONFIDANT Scheme observes the next hop neighbors behavior using the over hearing technique .This scheme causes same problems as the Watchdog Scheme. S.Bansal et al, proposed an observation based Co-Operation enforcement in Adhoc Networks (OCEAN).In contrast to CONFIDANT, OCEAN avoids in direct (second hand) reputation information & uses only direct first hand observation of other nodes behavior .A Node makes routing decisions only on the basis of direct observation. In this scheme, Rating is given to each node; initially each node is given value Null (0) - Neutral. With every positive action its value is incremented by 1 & with every negative action its value is decremented by 2.If the rating of node falls below a certain faulty threshold (-40).it is added to the list of faulty nodes. V. PROPOSED MODEL In this section we propose our IRSFwhich is improved version of AODVand ARAN RRAF Mechanism :
Trust value and battery capacity are the two main parameters in this method that make the routing algorithm more reliable. Before explaining the algorithm, trust estimation and power consumption mechanism are described below. Trust Evaluation: Trust value of each node is measured based on the various parameters like length of the association, ratio of number of packets forwarded successfully by the neighbors to the total number of packets sent to that neighbor and average time taken to respond to a route request [13,20]. Based on the above parameters trust level of a node i to its neighbor node j can be any of the following types: a)Node i is a stranger to neighbor node j Node i have never sent/received message to/from node j .Their trust levels between each other will be low. Every new node which is entering an ad hoc network will be a stranger to all its neighbors. b) Node i is an acquaintance to neighbor node j Node i have sent/received few messages from node j. Their trust levels are neither too low nor too high to be reliable. c) Node i is a friend to neighbor node j Node i have sent/received a lot of messages to/ from node j. The trust levels between them are reasonably high . The above relationships are represented in Fig.1 as a membership function. Energy Evaluation: We defined that every node is in high level which means it has full capacity (100%).The node will not be a good router to forward the packets If the energy of it falls below 50%. FuzzyLogic Controller: A useful tool for solving hard optimization problems with potentially conflicting objectives is fuzzy logic. In fuzzy logic, values of different criteria are mapped into linguistic values that characterize the level of satisfaction with the numerical value of the objectives. The numerical values are chosen typically to operate in the interval [0, 1] according to the membership function of each objective .Fig.1 represents the trust value membership function. According to three types of trust value: friend , acquaintance and stranger, we define three fuzzy sets :high, medium and low, respectively. we also determined three fuzzy sets for node's energy. For energy capacity between 50 % to 100% of total capacity ,we define high set, for 0% to 100% we define medium set and for 50% to 100% we define low set. The above relationships are represented in Fig.2 as energy value membership function and Fig.3 shows the membership function of reliability value.
In Eq.1 , represents crisp input ith (energy or trust values), represents fuzzy membership function for input ith , and yl is center average of output fuzzy set lth . Fig. 1 Membership function for trust value. The rules are as follows: Rule1: if trust value is high and energy value is high then reliable value is very very high. Rule2: if trust value is medium and energy value is high then reliable value is very high. Rule3: if trust value is high and energy value is medium then reliable value is high. Rule4: if trust value is medium and energy value is medium then reliable value is medium. Rule5: if trust value is low and energy value is medium then reliable value is low. Fig. 2 Membership function for energy value. Rule6: if trust value is anything and energy value is low then reliable value is very low. Route discovery procedure Step1: A source node starts to flood RREQ packets to its neighboring nodes in a MANET until they arrive at their destination node. Each RREQ consists of sourceid,destinationid,energy value and trust value of nodes along the path. Fig .3 Membership function for Reliability value. Reliability Evaluation: Reliability factor take different values based on six rules that dependent upon varied input metric values i.e. energy and trust values .A fuzzy system decides for each two input values which values appear in output. The fuzzy system with product inference engine , singleton fuzzifier and center average defuzzifier are of the following form: Step2: If the intermediate node N receives a RREQ packet and it is not the destination , then the information of node N is added to the RREQ packet which is appended to packet fields.After that , node N reforward the packet to all the neighboring nodes of itself. Step 3: If node N receives a RREQ packet and node N is the destination , it waits a period of time . therefore , the destination node may receive many different RREQ packets from the source. Then it calculates the value of reliability value for each path from source to the destination using the information in each RREQ packet. Finally , destination node sends a route reply(RREP) packet along the path which has a maximum reliable value.
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) VI. SIMULATION AND RESULTS The simulation environment is constructed by an 1500m 300m rectangular simulation area and 50 nodes, distributed over the area . Initial energy of a battery of each node is 4 Watts which is mapped to 100%.Simulation results have been compared with AODV.Simulation study has been performed for packet delivery ratio, throughput and end to end delay evaluations. Packet delivery ratio: The fraction of successfully received packets, which survive while finding their destination. This performance measure also determines the compeletness and correctness of the routing protocols[23]. End to End Delay : Average end to end delay is the delay experienced by the successfully delivered packets in reaching their destinations. This is a good metric for comparing protocols and denotes how efficient the underlying routing algorithm is, because delay primarily depends on optimality of path chosen[23]. Throughput: It is defined as rate of successfully transmitted data per second in the network during the simulation. Throughput is calculated such that , it is the sum of successfully delivered payload sizes of data packets within the period , which starts when a source opens a communication port to a remote destination port , and which ends when the simulation stops. Average throughput can be calculated by dividing total number of bytes received by the total end to end delay[23].
Fig.4 shows the packet delivery ratio with different mobility speeds.When mobile nodes moved at higher mobility speeds, both protocols decreased the packet delivery ratio. The reason is that the routing path was easy to break when the mobility speed increased , but we can see that RRAF transmits and receives more data packets than AODV. This is because RRAF always chooses the most stable route for transmission packets along the path instead of choosing the shortest path. In Fig.5 the simulation result shows that throughput of both methods reduces when the speeds increase. When the speed of the mobile node increased, the routing path was more unreliable. The reason is that there were more chances for routes to break when the speed of the mobile node was faster. Thus, the number of rebroadcasts increased.Since RRAF has chosen more reliable route than AODV, we can see that it has performed better at all speeds . Fig.6 shows average end to end delay with speed as a function .Here it is clear that AODV has less delays than RRAF. Higher delay in the proposed method is
because of the time it has wasted for discovering the route with longer life, so the packets would in the meanwhile stay in the buffer until a valid route is found . This takes some time and will, therefore , increase the average delay while AODV chooses the shortest path as a valid path. Fig.7 shows the performance of packet delivery ratio under various pause times. The results in Fig. 7 illustrate that packet delivery ratio in RRAF is better compared to AODV,and The results in Fig. 8 show that RRAF experiences a high end to end delay because route selection is based on trust and energy level not on the minimum number of hops .
fuzzy logic approach. In this scheme, we determine three parameters: trust value , energy value and reliability value that are used for finding a stable route from source to destination . During route discovery, every node records its trust value and energy capacity in RREQ packet .In the destination ,based on reliability value , is decided which route is selected .The path with more reliability value is candidated to route data packets from source to destination. The simulation results show that the proposed method has significant reliability improvement in comparison with AODV and ARAN. References [1] D. Remondo, Tutorial on wireless ad hoc networks, Second International Conference in Performance Modeling and Evaluation of heterogeneous networks, July 2004. [2] G. Aggelou, R. Tafazolli, RDMAR: a bandwidth-efficient routing protocol for mobile ad hoc networks, Proceedings of the Second ACM International Workshop on Wireless Mobile Multimedia (WoWMoM), August, 1999 pp. 2633. [3] R. Dube, C.D. Rais, K.Y. Wang, S.K. Tripathi, Signal stability-based adaptive routing (SSA) for ad hoc mobile networks, IEEE Personal Communications 4 (1997) 3645. [4] D.B. Johnson, D.A. Maltz, Dynamic Source Routing in Ad Hoc Wireless Networks, Kluwer, 1996. [5] V. Park, M.S. Corson, A highly adaptive distributed routing algorithm for mobile wireless networks, Proceedings of the 1997 IEEE INFOCOM, Kobe, Japan, April, 1997 pp. 14051413. [6] C.E. Perkins, E. Royer, Ad-hoc on-demand distance vector routing, Proceedings of the Second IEEE Workshop on Mobile Computing Systems and Applications, New Orleans, LA, USA, February, 1999 pp. 90100. [7] C.K. Toh, A novel distributed routing protocol to support ad-hoc mobile computing, Proceedings of the fifteenth IEEE Annual International Phoenix Conference on Computers and Communications, March, 1996 pp. 480486. [8] P. Jacquet, P. Muhlethaler, T. Clausen, A. Laouiti, A. Qayyum, L. Viennot, Optimized link state routing protocol for ad hoc networks,
VII CONCLUSION Since in MANET, mobile nodes are battery powered and nodes behaviour are unpredictable, wireless links may be easily broken. Hence it is important to find a route that endures a longer time. In this paper, we have proposed a reliable routing algorithm based on
Proceedings of the 2001 IEEE INMIC, December, 2001 pp. 6268. [9] S. Murthy, J.J. Garcia-Luna-Aceves, A routing protocol for packetradio networks, Proceedings of ACM First International Conference on Mobile Computing and Networking, Berkeley, CA, USA, November, 1995 pp. 86 95. [10] S. Murthy, J.J. Garcia-Luna-Aceves, An efficient routing protocol forwireless networks, ACM Mobile Networks and Applications, Special Issue on Routing in Mobile Communication Networks 1 (2) (1996) pp. 183197. [11] G. Pei, M. Gerla, T.W. Chen, Fisheye state routing: a routing scheme for ad hoc wireless networks, Proceedings of the 2000 IEEE International Conference on Communications (ICC), New Orleans, LA, June, 2000 pp. 7074. [12] C.E. Perkins, P. Bhagwat, Highly dynamic destination sequenceddistance-vector routing (DSDV) for mobile computers, Proceedings of the ACM Special Interest Group on Data Communication, London, UK, September, 1994 pp. 234244. [13] M. Pushpalatha, R. Venkataraman, and T. Ramarao, Trust based energy aware reliable reactive protocolin mobile ad hoc networks, World Academy of Science, Engineering and Technology 56 2009. [14] N.-C. Wang , Y.-F. Huang , J.-C. Chen, A stable weightbased on-demand routing protocol for mobile ad hoc networks , Information Sciences 2007pp 55225537. [15] N.-C. Wang ,S.-W.Chang , A reliable ondemand routing protocol , Computer Communications 2005, pp 123135 . [16] N.-C.Wang , C.-Y.Lee, A reliable QoS aware routing protocol with slot assignment for mobile ad hoc network , Journal of Network and Computer
Applications Vol. 32, Issue 6, November 2009, Pages 1153-1166. [17] R. Patil and A.Damodaram, Cost Based Power Aware Cross Layer Routing Protocol For Manet, IJCSNS International Journal of Computer Science and Network Security, VOL.8 No.12, December 2008. [18] G. Nishant and D. Samir, Energy-aware ondemand routing for mobile Ad Hoc networks, Lecture notes in computer science ISSN: 0302-743, Springer, International workshop in Distributed Computing, 2002. [19] M.Tamilarasi, T.G Palani Velu, Integrated Energy-Aware Mechanism for MANETs using Ondemand Routing, International Journal of Computer, information, and Systems Science, and Engineering 2;3 www.waset.org Summer 2008. [20] M.Pushpalatha, Revathi Venkatraman, Security in Ad Hoc Networks: An extension of Dynamic Source Routing, 10th IEEE Singapore International conference on Communication Systems Oct 2006,ISBN No:1-4244-0411- 8,Pg1-5. [21] H. Wu1and C.n Shi1, A Trust Management Model for P2P File Sharing System, International Conference on Multimedia and Ubiquitous Engineering, IEEE Explore 78- 0-7695-3134-2/08, 2008. [22] G.Ghalavand,A.Dana,A.Ghalavand,andM.Rezahosei ni,Reli able routing algorithm based on fuzzy logic for mobile adhoc networks, International Conference on Advanced Computer Theory and Engineering (ICACTE), 2010 . [23] V.Rishiwal,A.Kush and S.Verma, Backbone nodes based Stable routing for mobile ad hoc network,UBICC Journal, Vol2,No.3,2007,pp34-39
Group Key Transfer Protocol for Group Communication

1
Velumadhava Rao R1 Department of Computer Science, Anna University, Chennai
Abstract
Securing group communication is an important issue in most of the network application. Our proposed approach uses an authenticated group key transfer protocol based on encryption technique that relies on trusted key generation center (KGC). KGC computes group key and transport the group keys to all communication parties in a secured manner. The user management activities such as join/leave operation are carried out by KGC. Also, the proposed approach facilitates efficient key generation and key distribution techniques such that only authorized group members will be able to retrieve the secret key and unauthorized members cannot retrieve the key. We also provided authentication for registration process and also for transporting the group key. protocol needs to distribute one-time secret session keys to all participating entities. According to [1] there are two types of Key establishment protocols namely: key agreement protocols and key transfer protocols. The most commonly used key agreement protocol is Diffie-Hellman (DH) key agreement protocol [2]. However, the Diffie Hellman key distribution algorithm can provide secret key only for two entities, and cannot provide secret keys for the group that has more than two members. For two-party communications, after the establishment of a communication session, if either party leaves or stops the conversation, the session terminates. When there are a more number of members in a group the time delay for setting up the group key will take longer time. Hence, it is necessary to propose a new technique to avoid this type of constraints in group communication. In this proposed work, group communication applications will make use of key transfer protocol to transmit data to all the group members with the minimum resources needed for this group communication. During a group communication session, members can join or leave the system at any time, while the group communication is in progress. Whenever member join or leave the group, the group key needs to be updated inorder to maintain forward and backward secrecy. Moreover, there are many protocols that have been proposed to solve the problem of group key distribution. In this paper, we propose a new protocol that aims at authenticating the users who joins the communication group. The proposed protocol works with a two level controllers. First is the KGC server which is responsible for group key generation and
1. INTRODUCTION With the exponential growth in modern communication, secure group communication is becoming an extremely important research area because of the need to provide authentication in communications. Many applications that require secure group communication are now widely used, such as tele-medicine, real-time information services etc. Group communications are implemented using broadcast or multicast mechanisms. Security of group communication is enforced by cryptographic techniques such as encryption, authentication, signatures and confidentiality. Providing authentication and confidentiality for the messages exchanged between group members is an important issue in Secure Group communication. Message Authentication ensures the sender that the message was sent by a specified sender and the message was not altered anywhere in the transmission path. Message confidentiality ensures that the sender confidential data can be read only by an authorized and intended receiver. Hence, the confidential data is secured in an efficient way which is not tampered by unauthorized users. Hence to provide authentication and confidentiality, one-time session keys need to be shared among communication entities to encrypt and authenticate messages. Group communication is achieved by encrypting the group messages using a shared secret called group key. The group key is only accessible to group members and thus, only group members are able to decrypt the messages. Therefore exchanging communication messages, a key establishment
key distribution. The other work of KGC is responsible for handling member join or leave operation. The proposed key management protocol relies on a centralized key server that coordinates protocol runs to distribute the group key to group members securely. Whenever members join or leave the group, the KGC takes care of user activities and also it is responsible for intimating these activities to KGC. The KGC server is responsible to update the group key. A secure group key management protocol has to ensure that users not belonging to the group cannot get the group keys. Whenever there is a change in the membership, re-key operation for the group key must be occurred and then distributed to all group members [5, 6, 7]. The data to be transmitted is encrypted along with the session key and broadcasted to all authorized group members. Knowing the session key, the group members are able to retrieve the transmitted data by means of decryption process. The remaining paper is organized as follows. Section 2 surveys about the existing work in this area. Section 3 explores the proposed work and the implementation details. Section 4 analyzes and discusses the results obtained from the work. Section 5 concludes the proposed and implemented work and suggested some possible enhancements. 2. RELATED WORK There are many works pertaining to the secure group communication, but some of the important works has been cited here. Mike Burmester and Yvo Desmedt (2005) presented a Group Key Exchange protocol which extends the Diffie-Hellman protocol. Their protocol is scalable and secure against passive attacks. But, Diffie Hellman public key distribution algorithm is able to provide group key only for two entities. Bohli (2006) developed a framework for robust group key agreement that provides security against malicious insiders and active adversaries in an unauthenticated point-to-point network. Bresson et al. (2007) constructed a generic authenticated group DiffieHellman key exchange algorithm which is more secure. Katz and Yung (2007) proposed the first constant-round and fully scalable group DiffieHellman protocol which is provably secure. There are many other works related to group key management protocols based on non-DH key agreement approaches. Among them, Tzeng (2002) presented a conference key agreement protocol that
relies on discrete algorithm assumption with fault tolerance. This protocol establishes a conference key even if there is several numbers of malicious participants in the conference. Hence, it is not suitable for group communication. Cheng and Laih (2008) modified the work of Tsengs conference key agreement protocol based on bilinear pairing. Moreover, in a centralized group key management, there is only one trusted entity responsible for managing the entire group. Hence, the group controller need not depend on any auxiliary entity to perform key distribution. Harney et al. (1997) proposed a group key management protocol that requires O(n) where n is the size of group, for encrypting and update a group key when a user is evicted or added in backward and forward secrecy. Eltoweissy et al. (2004) developed a protocol based on Exclusion Basis Systems (EBS), a combinatory formulation for the group key management problem. A set of scalable hierarchical structure-based group key protocols [8], [9], [10] have been proposed. Lein Harn and Changlu Lin (2010) introduced a group key transfer protocol where members of the group fully rely on Key Generation Center (KGC). They proposed an authenticated key transfer protocol based on secret sharing scheme that KGC can broadcast group key information to all group members at once. Chin-Yin Lee et al. (2011) addressed the security issues and drawback associated with existing group key establishment protocols. They have also used secret sharing scheme to propose a secure key transfer protocol to exclude impersonators from accessing the group communication. Their protocol can resist potential attack and also reduce the overhead of system implementation. 3. PROPOSED WORK Based on the above survey in this secure group communication, it is necessary to propose a new model to solve the identified issues. In this section, we first describe the model of our group key transfer protocol and then the different components of our proposed system. 3.1 Model KGC generates the session key and transported to each group member participating in group communication. KGC acts as a trusted entity, to generate and distribute the group key. Each user is required to register at KGC for subscribing the key distribution service. The KGC keeps tracking all
registered users and removing any unsubscribed users. Once the KGC receive the membership change message from KGC, KGC encrypts the randomly selected group key along with the hash value generated by selecting the participating group members and sends the ciphertext to each group member separately. An authenticated message checksum is attached with the ciphertext to provide group key authenticity. Our protocol uses encryption/decryption algorithm. In this proposed approach, the confidentiality of group key is ensured using any encryption algorithm which is computationally secure. A broadcast message is sent to all group members at once. In addition, the authentication of broadcasting message can be provided as a group authentication. This feature provides efficiency of our proposed protocol. Also efficiency is achieved due to the data packet that is transmitted only once and they are traversed on any link between two nodes by saving bandwidth. 3.2 Process The proposed model consists of four processes namely the User Registration, Join/Leave Component, Key generation and Key distribution. The four main processes are explained as below. 3.2.1 User Registration This module explains the process of User Registration. Each user has to register their identity at KGC for subscribing the key distribution service. 3.2.2 Join/Leave Component This module explains about the process of member join/leave in group communication. Once user has registered with KGC, the user control activity is transferred to KGC. KGC is responsible for maintaining the member join or leave operation. The individual member of the group selects the other group members whom the secret message has to be transmitted. Then KGC broadcast the authorized group members participating in group communication. The members can leave the group at any time. Once member join/leave the group, the forward and backward secrecy has been maintained. 3.2.3 Key Generation Whenever there is a group of users participating in a group communication, the Key Generation Center (KGC) will generate the group key and broadcasted to all authorized group
members. Unauthorized group members will not able to retrieve the group key. 3.2.4 Key Distribution The key generated will be distributed to all group members in a secure manner. Whenever member join/leave the system, new group key is generated to achieve forward and backward secrecy. Forward secrecy helps us to ensure that whenever the user leaves the group he will not have any access to the future key and backward secrecy helps us to ensure that whenever a new user joins the group, he will not get any access to the previous details. Consider a group of t members {U1, U2,, Ut}. The key generation and key distribution process contains five steps. Step1. The initiator sends a key generation request to KGC with a list of group members as {U1, U2,, Ut}. Suppose U1 is the initiator, then U1 selects the group members U2, U3, U4. Step2. KGC broadcasts the list of all participating members, {U1, U2, U3, U4} as a response via KGC. Step3. KGC randomly selects a group key, k, and broadcasted to the participating group members. The KGC generated the hash value based on the group of participating members and encrypt the session key along with that hash value as Auth = {k, U 1, U2, U3, U4}. Auth is broadcasted to all participating group members in a secure way. Step4. For each group member, Ui where i=1, 2, 3, 4 knowing the group key and the participating members of the group, the users computes the h (k, U1, U2, U3, U4} and checks whether the computed hash value is identical to Auth. If there two hash values are identical, then Ui ensures that the group key is from authenticated KGC. 4. Security Analysis The main security goals for our group key transfer protocol are: key freshness, key confidentiality and key authentication. Key freshness formalizes the fact that the session key is not obviously known by the adversary through basic means and to ensure that a group key has never been used before. Thus, a compromised group key cannot cause any further damage of group communication. Key confidentiality is to protect the group key such that only authorized group members can able to retrieve and the unauthorized user cannot able to retrieve the group key. Key authentication ensures that only to authorized group members that the group key is distributed by KGC, but not by an attacker. An attacker can impersonate a user to request for a group
key service. In addition, attacker can also modify information transmitted from users to KGC without being detected. Attackers can neither obtain the group key nor share a group key with authorized group members. 5. Conclusion and Future Work We have proposed an authenticated group key transfer protocol based on encryption/decryption techniques. Also, we provide group key authentication by introducing an efficient hashing function for message transfer. The session key is encrypted along with the group member information and broadcasted to participating group members. The session key is used to encrypt/decrypt the data that is transferred between the group members. In future, we wish to implement our design for communication in dynamic and hierarchical groups. 6. REFERENCES [1] Lein Harn, Changlu Lin, Detection and identification of Cheaters in (t, n) Secret Sharing Scheme, Designs, Codes and Cryptography, Vol 52, Issue 1, pp 15 24, July 2009. [2] Lein Harn, Changlu Lin, Strong (n, t, n) verifiable secret sharing scheme, Information
Sciences: an international Journal, Vol 180, Issue 16, pp 3059-3064, August 2010. [3] Lein Harn, Changlu Lin, Dingfeng Ye, Ideal Perfect Multilevel Threshold Secret Sharing Scheme, IAS 09 Proceedings of the 2009 Fifth International Conference on Information Assurance and Security, Vol 02, pp 181-121, 2009. [4] Eskicioglu, A.M, Delp, E.J, A key transport protocol based on secret sharing applications to information security, Consumer Electronics, IEEE Transactions on, Vol 58, Issue 4, pp 816-824, April 2003. [5] Kevin Atighehchi, Traian Muntean, Sylvain Parlanti, Robert Rolland, Laurent Vallet, A Cryptographic Keys Transfer Protocol for Secure Communicating Systems, 12th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing, pp 339-343, 2010 [6] Jun Yang, Xianze Yang, Nan Zhang, Authenticated Key Transport Protocol Based on a Locking-Signing Technique, 2009 International Forum on Computer Science-Technology and Applications, 2009 [7] Lein Harn and Changlu Lin, Authenticated Group Key Transfer Protocol, IEEE Transactions on Computers, Vol. 59, pp. 842-846, 2010
HYBRID FRAMEWORK FOR FEATURE REDUCTION OF MAC LAYER IN WIRELESS NETWORK

Krishna Kumar S1,Anbuchelian.S2 1 Department of Computer Science, 2 Ramanujam Computing Center, Anna University, Chennai Abstract Intrusion Detection System (IDS) are mainly used for protecting Network Resources from illegal user penetrations. Mobile nodes are subject to a unique form of Denial of Service (DoS) attacks called battery exhaustion attack, in which the attacker attempts to rapidly drain the battery of the nodes. DoS attacks are those that target the battery life in mobile devices and render the device inoperable. The attacker targets the energy factor of the node in which the energy is reduced drastically and continuously. In misuse detection, classifiers are used as detectors. Selecting the best features is central for ensuring the performance, speed of learning, accuracy and reliability of these detectors. In addition it removes noise from the set of features used to construct the classifiers. A hybrid model is designed that efficiently selects the optimal set of features in order to detect 802.11 - specific intrusions. Power Factor bit is also added with the optimal feature set, without affecting the efficiency in-order to identify the power based attacks. This model for feature selection uses the information gain ratio to compute the relevance of each features, the k-means classifier is used to select the optimal set of MAC layer features which can improve the accuracy of IDS while reducing the learning time of the learning algorithm after including power resource bit. 1. INTRODUCTION Wireless has opened a new and exciting world for many of the user in the network. This technology is advancing and has changing nature every day due to its popularity and is drastically increasing. With improved encryption techniques, a novel solution have been developed to help combat this problem is the Wireless Intrusion Detection System (WIDS). In this wireless world security becomes the major part for the entire user in the network. 1.1 INTRUSION DETECTION SYSTEM An Intrusion Detection System (IDS) is a software or hardware tool used to detect unauthorized access of a mobile node or network. A wireless IDS performs this task exclusively for the wireless network. These systems can monitor traffic on the network looking for and logging threats and also alerting personnel to respond. An IDS can be performed in two ways namely Signature-Based or Anomaly Based Detection. More over the two kinds of approach used to detect the network intrusion are Misuse Detection and Anomaly Detection 1.1.1 MISUSE DETECTION Misuse Detection System is very much the same as high-end computer anti-virus applications. Misuse Detection IDS models analyse the system or network environment and compare the activity against signatures (or patterns) of known intrusive computer and network behaviour. These signatures must be updated over time to include the latest attack patterns, much like computer anti-virus applications, if the target deployment is only of few computer systems, then misuse-based IDS is easy to implement, update and deploy. However, if the scope of deployment is large, the implementation, updating and deployment could be quite a complex task. The proposed model provides less false alarm when compared with other IDS methods. 1.1.2 ANOMALY BASED DETECTION An Anomaly-Based Intrusion Detection System is used identify computer intrusions and misuse by monitoring system activity and classifying it as either normal or anomalous. The classification is based either on heuristics or rules, rather than patterns or signatures. This will detect any type of misuse that falls out of normal system operation. In order to determine the type of traffic attack, the system must be taught to recognize normal system activity. This can be accomplished in Artificial Intelligence (AI) type techniques. Another approach to find normal usage is to adhere strictly to mathematical model, deviation from the model is flag abnormal model. But still these techniques are limited to high false positives rate.
1.2
POWER ATTACK
A new type of Denial of Service attack has been created by attackers. These Denial of Service attacks [3] target the battery of the mobile devices, rendering the device inoperable. Battery power in mobile computing is a critical resource. Battery exhaustion attacks make the device inoperable by draining the battery more quickly than it would be under normal usage. In a typical mobile node, the battery is expected to give a certain battery life under a set of usage conditions where the user is actively using the device for a small fraction of the time, and the device is idle the rest of the time. If an attacker can prevent the device from entering low power modes by keeping it active, the battery life can be drastically shortened. 1.3 OPTIMAL FEATURE SELECTION Traditional IDSs can prevent TCP/IP software applications in network from intrusion attempts, where the physical layer and data link layers are more vulnerable to intrusion. These vulnerabilities may cause communication failures. In absence of physical boundaries in wireless networks, it is difficult to monitor the intruders, as the attack can be perpetrated from anywhere. Classifiers are constructed to identify the traffic as anomalous traffic or normal traffic by selecting the features of each layer. It has been analysed that, selecting only optimized feature set from layers may increase the detection rate and by reducing the learning time. Also, it has the major impact on the performance, accuracy and reliability of IDS. 2. RELATED WORKS The convenience of 802.11-based wireless access networks has led to widespread deployment in the consumer, industrial and military sectors. However, this use is predicated on an implicit assumption of confidentiality and availability. While the security flaws in 802.11s basic confidentially mechanisms have been widely publicized, the threats to network availability are far less widely appreciated. In fact, it has been suggested that 802.11 is highly susceptible to malicious denial-of-service [3] attacks targeting its management and media access protocols. This paper provides an experimental analysis of such 802.11specific attacks practicality, the efficacy and potential low-overhead implementation changes to mitigate the underlying vulnerabilities.
Adebayo O et al (2008) proposed two machine learning technique called Rough Set (LEM2 Algorithm) and k-Nearest Neighbour (kNN) [1] for intrusion detection. Rough set is a classic mathematical tool for feature extraction in a dataset which also generates explainable rules for intrusion detection. The two algorithms performed poorly on U2L and R2L due to their few representations in the training dataset. However, the attribute values in a training data set completely differ from the attribute values from the test dataset mostly for these two attack types. This leads to wrong classification because these instances are not learned in the training phase. Boukerche A et al (2007) presents a novel intrusion detection model based on artificial immune [2] and mobile agent paradigms for network intrusion detection. The construction of the model is based on registries signature analysis using both Syslog-ng and Log check unix tools. The tasks of monitoring, distributing intrusion detection workload, storing relevant information, and ensuring data persistence and reactivity have been carried out by the mobile agents, which represent the leukocytes of an artificial immune system. A pioneering Battery-Sensing Intrusion Protection System [5] proposed by Timothy K et al (2007), which alerts on power changes on small wireless devices, using an innovative Dynamic Threshold calculation algorithm. The B-SIPS design offers a hybrid intrusion detection method that can serve to protect small mobile computers from anomalous activity. A service requesting power attack attempts to repeatedly connect to the mobile device with genuine service requests with the intent of draining power from the devices battery. A benign power attack attempts to start a power demanding process or component operation on the host to rapidly drain its battery. A malignant power attack actually succeeds at infiltrating the host and changes programs to devour much more power than is typically required. An attack of this nature will use more power, and thus demonstrates the need for an integrated battery sensing IDS. 3. PROPOSED ARCHITECTURE Based on the survey over this are we propose a new technique for solving such issues. In our proposed hybrid approach, it uses Wrapper model and Filter model to identify the intruders. Most existing work is
based on the attacks in Network Layer and Data Link Layer. Power based attacks on MAC Layer are not considered in Intrusion Detection System models. In this proposed system architecture MAC Layer header frames are used to identify the intruders. These frame values are ranked using Information Gain Ratio Measure to sort the list. The best optimal list has been obtained from the original MAC Layer frame by obtaining the accuracy of identifying intruders in the network.
In this the IGR Measure is considered rather than Information Gain, because the latter is biased towards the feature. IGR is defined as IGR(Ex,f)= [1] where Ex is the set of vectors that contain the header information and f is feature of MAC Layer Gain(Ex,f)=Entropy(Ex)--------> Equation Number [2] The entropy function is the Shannons entropy defined as Entropy (Ex) = - pilog2(pi) --------> Equation Number [3] where Pi is the probability of a class i. --------> Equation Number
Fig 3.1 WIDS framework to detect attacks performed on MAC Layer The proposed system architecture consist of major components namely Data Collection, Pre Processing, Optimal Set selection and Classifier. Data Collection This module collects all the data from the network. That consists of normal data and attack data. Our Attack Data is mainly concentrated on MAC Layer and hence the MAC Layer Features are collected from the network traffic. Pre Processing Collected data in MAC Layer Features are preprocessed. This preprocessing consists of two processes for identifying Information Gain Ratio for all the Features and rank the features based on the IGR values. INFORMATION GAIN RATIO We used the Information Gain Ratio (IGR) is a measure to determine the relevance of each feature.
SplitInfo(Ex,f)is defined as
SplitInfo(Ex,f)= > Equation Number [4]
--------
OPTIMAL SET SELECTION The Optimal Set Selection algorithm is used to extract only the optimized feature set from total feature list. It starts with empty set S and add features one by one to the set S, in the order where the features are ranked using Information Gain Ratio. After each iteration, of adding feature to the set S, accuracy of IDS is measured using K-Means Classifier. The iteration is continued until it reaches the maximum accuracy, once it attains the maximum accuracy and goes below the gained threshold value. Optimal Selection Algorithm to find the optimal list of MAC Layers are presented below Input : F- Full Set of ordered features,
C K-Means classifier, T Gained Accuracy Threshold Output: Optimal Subset of feature Step 1:Initialize: S={},ac = 0 Step 2:Repeat Step 2.1:ap=ac Step 2.2: f=getNext(F) Step 2.3: S=S U {f} Step 2.4: F=F-{f} Step 2.5: ac=accuracy(C,S) Step 3:Until (ac-ap) < T or ac <ap Classifier The use K-Means Classifier is used to cluster the data set with normal data and specific attack data set. If a known attack is performed the data is classified and grouped to the known attack pattern. 4. EXPERIMENTAL STUDIES We have implemented our system to identify the energy loss in each scenario by simulating the network in Network Simulator 2 (NS2). 20 Nodes are simulated in wireless model configured as shown in Table I. TABLE I Node Configuration in NS-2 PARAMETER Channel Propogation Antenna Inital Energy Rx Power Tx Power Node Movement Topology VALUE Channel/ WirelessChannel Propagation/ TwoRayGround Antenna/ OmniAntenna 1000 Joules 10 20 Random Movement 500 X 500
Routing Protocol
AODV
Scenarios in which the nodes simulated are Normal Packet Transfer Frame Loss Scenario Power Attack Scenario The scenario presented below describes packet transfer from three different sources to single destination with normal delivery of all packets transferred, 50% of packet loss occurs due to congestion at nodes, and with power attack mode with energy model as Attack-Model, which doubles the sending time and receiving time. Table II shows the energy loss of destination node at each time in each scenario. Figure 2 shows the graph between energy losses in each scenario.Figure 2 shows that the normal packet transfer has 23% of energy loss from initial energy and the power attack has nearly 43.7% of energy loss from the initial energy. TABLE II Attack Analyses for the destination node
Attack Analyses for the destination node Time (s) 0 5 10 15 20 25 30 Normal Data (J) 1000 959.22596 909.58287 858.46454 816.98721 796.79580 777.074895 50% Packet Power Loss (J) Attack (J) 1000 959.56063 910.214565 857.407009 804.467117 753.580248 710.080862 1000 919.61616 820.53562 718.45791 636.69017 599.72867 563.710159
[15]
[16]
Figure1 Energy Comparison of Normal Data with Attack Data 5. CONCLUSION AND FUTUREWORK In this proposed work, power based attack has been developed and the energy parameter of each node is identified during the attack. The major contribution is a comparison survey had been taken with normal data and attack data. Power attack will increase the transmitting and receiving time of packets which reduces the energy of node drastically and repeatedly. From the simulation results it has been identified that normal data has an energy loss of 23%, but during the power attack it is increased upto 47%. Energy loss has been measured for different packet loss ratio. Among the three different levels of packet loss 50% packet loss has suffered major loss of 29% of total energy. And the vulnerability of the wireless network is found. Further work in this direction could be identification of new types of attacks that reduces the power loss and to build effective WIDS for power attacks.
[17]
[18]
Network Intrusion Detection based on Rough Set and k-Nearest Neighbour, International Journal of Computing and ICT Research, Vol. 2, No. 1, 2008 pp. 60 - 66. A. Boukerche, R.B. Machado, K.R.L. Juca , J.B.M. Sobral, and M.S.M.A. Notare, An Agent Based and Biological Inspired Real- Time Intrusion Detection and Security Model for Computer NetworkOperations, Computer Comm., vol. 30, no. 13, pp. 2649- 2660, Sept. 2007. Konstantinos Pelechrinis, MariosIlliofotou and Srikanth V Krishnamurthy, Denial Of Service Attacks in Wireless Networks IEEE communications survey & tutorials Vol 13, No2,Second Quarter 2011 pp. 245-257, 2011 Khalil El-Khatib,Impact of Feature Reduction on the Efficiency of Wireless IDSIEEE Transactions on distributed systems, VOL21 No.8, 2010. Timothy K. Buennemeyer, FaizMunshi, Randy C. Marchany, and Joseph G. Tront, BatterySensing Intrusion Protection for Wireless Handheld Computers using a Dynamic Threshold Calculation Algorithm for Attack Detection, Proc. of the 40th Hawaii International Conference on System Sciences 2007.
References
[14]
Adebayo O. Adetunmbi, Samuel O. Falaki, Olumide S. Adewale and Boniface K. Alese
Black Market Botnets and Securing Informations

S.JEYAKUMAR SAMRAJ1, J.ANTON BOSE2, C.P.RICHARD NICKSON3, J.PUGAZHENDI 1, 2, 3 PG Student, Anand Institute of Higher Technology, Chennai. Abstract Botnets have yet to be exploited to their full potential, because they have yet to take advantage of all the information available to them. A botmaster who controls a botnet can use technology that exists now to create an infrastructure for selling information to third parties in a new way, exploiting the so-called Long Tail. This results in not one, not two, but three new markets and untapped revenue streams for botmasters. We outline motivations for such a business model, as well as the mechanics of a possible implementation. We then present a variety of defenses against this scenario. their customers the adversaries to search for what they want. In business terms, the botmaster of a black market botnet is taking advantage of the Long Tail [2]. This is the name given to the observation that the 1 Introduction total sales volume of specialized items, targeting Many details about people are best retrieved at the niche markets, can compete with sales of masssource: the computer which a person uses to store market items. The trick is that sellers must be able to their information digitally. Users enter information provide specialized items efficiently, and customers about almost every aspect of their lives. This is must be able to find them easily. In botnet terms, the particularly true in a business environment, where sales curve might look like Figure 1.Besides being email is a key form of communication, and internal able to target new niche markets, there is another documentation and reports are created constantly. advantage to the botmaster. Because of the scarcity, This leads to a question. Can users private or maybe even uniqueness, of the specialized documents be selectively stolen by an adversary? documents being stolen and sold, adversaries would Botnets play a critical part in answering this pay much higher prices than they would for run-ofquestion. The zombie computers that comprise a the-mill information. The botmaster of a black botnet have access to the private documents of the market botnet may people that use the zombie computers, to documents that would otherwise be inaccessible to adversaries. However, not all adversaries have the desire or capability to establish a botnet. Enter the botmaster. The botmaster, the person who creates and operates a botnet, is not the adversary (at least not directly) in this paper. The botmaster is a businessperson, who has access to private documents through their botnet. They do not necessarily want these documents themselves, but there are many adversaries who would; the motivation for the botmaster is to sell the documents for a profit. More importantly, the key insight underlying our black market botnets is this: Figure 1: The Long Tail of a black market botnet The botmaster does not know what documents are valuable to the adversary. find the Long Tail good for both total sales volume Notice the important distinction between this and profit.There are many possible scenarios in scenario and what happens in current botnets, where which a botmaster could sell specific documents to botmasters harvest information they know has value, adversaries. For the sake of discussion, we ignore the like credit card and PIN numbers. In contrast, the fact that few of these scenarios would be legal: botmaster in our black market botnets will not know what has value, and will compensate by allowing Organized by: Department of Computer Science and Engineering, Anand Institute of Higher Technology, Chennai. www.iconic12.in E-mail: iconic.aiht@gmail.com Page 533
A company looking for information about current research projects of their competitors could search for internal documents from competing companies. A company could per form market research on customers. A private investigator could use private documents as another source of information in their investigations. Paparazzi could search for information on celebrities. Terrorists could search for security weaknessesby looking for classified documentation on a target facility. Counter -terrorism agencies would be able to search for intelligence to thwart terrorists plans. An adversary that is planning a targeted electronic attack against an organization (e.g., using social engineering) could start by searching for insider information about the organization, making their attack more convincing. Police agencies could use private documents in an attempt to catch people who commit serious crimes, like people who produce child pornography. While the legal ramifications of this would require some consideration, evidence acquired by illicitlyconducted computer searches has been admitted previously [27 29]. It is important to stress that this threat is not just another doom-and-gloom scenario., the Gozi Trojan was discovered in the wild. This Trojan steals posted web form data and transmits it to the botmasters web site; adversaries can search through the captured data and purchase the results [15]. While currently restricted to a fixed type of data, Gozi demonstrates that the general threat we describe is beginning to evolve. While we would like to put a dollar amount on the markets we present, it is not yet possible to do so in any meaningful way. Unlike fledgling studies of current underground economies [12], there is no extant data to be gathered and analyzed, nor is it clear what value an adversary would place on an invaluable
document.
Figure 2: Basic architecture of a black market botnet 2 Basic Black Market Botnet Architecture Conceptually, a black market botnet consists of two parts. First, there is the botnet that has access to salable documents. Second, there is a mechanism through which adversaries can search for documents of interest. In practice, the search mechanism may be hosted by machines in the botnet. Because the total volume of documents on all of a botnets computers would be large, we assume that it is not feasible to transfer all documents to a central location in their entirety. We therefore assume that adversaries search queries would be injected into the botnet, and the botnet would return the search results. (We revisit this latter assumption in the next section.) This leads to the architecture shown in Figure 2. We note that the botnet does not even need to be excessively large for this to work, just well-targeted: for example, a botnet affecting only Fortune 500 companies would contain documents of interest to a number of adversaries.To search documents, one option is for the botmaster to use existing peer-to-peer software and searching algorithms in their botnet (e.g., those used by the Gnutellaor FastTrack networks [13]). Indeed, botnets have been known to employ P2P structures already [9]. Adversaries would use peerto-peer clients to submit their search requests to the black market botnet. The search results would only contain a small, fixed excerpt of the document, rather than the entire file. This would prevent adversaries from piecing together a document based on repeated search results. Recent peer-to-peer file sharing applications typically include search features to allow users to find files based on attributes like their name, type, and size. While these attributes might prove useful, more complex indexing of documents on zombie machines will almost certainly be
required. Another search option would be for the botmaster to
Figure 3: Advanced architecture with search term identification construct their own web search portal, allowing adversaries to use regular web browsers to search. Obviously the botmaster takes a risk here, because the search portal can be shut down. Spammers and phishers have dealt with this problem for years, safeguarding their websites using methods like bullet-proof hosting [23], fast flux [22], and levels of redirection [30]. Regardless of the mechanism used to perform searches, the question of how an adversary would search for documents of interest still remains. One possibility is that adversaries could use a traditional keyword-based search.However, this approach would be limited. In particular, the onus is on the adversaries to enter the magical incantation of keywords to find what they are looking for. Moreover, due to the vast numbers of documents that would be available to a large botnet, a search query that includes common words would return far too many documents to easily sift through. Web site search engines mitigate this problem by ranking results based on the popularity of a web site, but private documents of interest to only a few adversaries cannot be ranked the same way in terms of popularity. A ranking could be constructed of the popularity of various search terms rather than of the documents themselves, aiding the adversary in constructing useful searches, but revealing information to defenders who infiltrate the search interface. A search term ranking also assumes some degree of commonality in search terms, an assumption that may not be valid depending on where an adversary sits on the Long Tail. How can a botnet be effectively searched for interesting documents? The simplest approach would be to search recently edited documents, assuming the
adversary will want current information. However, only looking for recently edited documents would miss other potential targets, such as the previous years tax returns. Therefore, more sophisticated indexing methods must be used. If a botmaster chooses to use existing software, Google Desktop already provides indexing and document searching capabilities. A zombie could download and install Google Desktop, let it index all of the documents on the computer, and then direct search queries to Googles software. However, while convenient for the botmaster, this approach does not solve the problem of excessive query results. This issue is the subject of the next section. For completeness, although it is probably obvious, an adversary who finds an interesting document excerpt would be able to purchase the entire document from the botmaster. Payment could be managed through established means, like WebMoney, e-gold, or PayPal. 3 e(vil)Bay: Advanced Black Market Botnet Architecture In the last section, we assumed that the botmaster would not know what documents were interesting to an adversary; this assumption helped shape the basic black market botnets architecture. This setup was limited in effectiveness, however. In this section, we explore how the botmaster can have zombies automatically identify potentially interesting search terms. Dorre et al. discuss a system for text mining [10], the first part of which is to algorithmically extract what they call features, or significant vocabulary items, such as credit line. Amazon has followed along similar lines with their Statistically Improbable Phrases (SIPs) [1]. SIPs are phrases gathered from books or other literary works that are mostly unique to each book. For example, if one particular book mentions fuzzy bunnies on the beach at sunset multiple times, but very few other books use this phrase, then this would be a SIP for the book. Closely related is the concept of inverse document frequency (IDF) [24], a measure which can be used to retrieve individual documents containing a term which occurs in few of all the available documents. The ability to automatically identify potential search terms suggests that a different architecture for the black market botnet is possible, shown in Figure 3. Upon identifying terms, a zombie could take excerpts of those terms with a small amount of their context in a document, and transmit the excerpts to a search
interface. The search interface would perform adversaries searches locally on the excerpts, rather than send queries into the botnet. For a botmaster, this architecture has the advantage of less telltale traffic to zombie computers, at the cost of a centralized target for defenders. There are other advantages, too. A straightforward search interface would meet the needs of an adversary regardless of where their interests were on the Long Tail. Automatically collecting interesting excerpts from the black market botnet would allow the excerpts to not just be searched, but be advertised they may be of interest to more than one adversary, be sitting at the thicker part of the Long Tail. The search interface could then be an auction site, where adversaries can bid on the document excerpts. The adversary who wins the auction pays the botmaster, who then instructs the zombie computer to send the document to the buyer. When document excerpts are posted for auction, a botmaster has two options as to how and where the listings are posted. First, the botmaster could create their own auction website, though as with the search portal described in the previous section, there is a risk that the site will be shut down, depriving the botmaster of income. This is the approach taken by Gozi [15]. Second, a botmaster could use an existing auction website. As the largest and most recognized auctioning site, we use eBay as an example. The black market botnet wouldnt be able to post fragments of documents directly on eBay, as it would be trivial for the sites operators to take down such listings. Instead, the listings would need to be obfuscated, so that an average viewer would not know the illicit nature of the auction. Steganography information hiding has obvious application here (for more, see [19]). Many auction sites such as eBay allow sellers to post images of an item being sold, and this presents a good opportunity to apply steganography to hide information in the images. From a humans perspective, the auctioned items will look perfectly normal: a battered old teapot being sold, complete with a picture. However, adversaries who knows what to look for will be able to uncover information about the real document being sold. This is similar to reported criminal activity, where drugs are being sold on eBay using cover items [32]. If a black market botnet were to use images containing hidden information in eBay listings, two main issues must be addressed: how does an adversary find the illicit listings, and how does an adversary extract the hidden information? To answer
the first question, a botmaster can provide adversaries with a list of accounts with which the items are posted. This list could be distributed through different kinds of channels, such as direct communication over IRC, which is already known to be used by fraudsters selling credit cards and other personal information [25]. eBay has a search utility where one can limit search results to a specific seller. Adversaries can use this search utility to quickly locate listings which are about private documents. Of course, if the botmaster only has one or two accounts, eBay would eventually find the accounts and shut them down. However, phishers have also dealt with this problem for years. Their solution is to simply move on, and provide the new information to their targets (in this case, the buyers). This is certainly not an ideal solution, but as long as it is good enough, a botmaster will be able to maintain operations. Decoding the steganographically-hidden information could be accomplished if the botmaster provided adversaries with a program able to decode an image. Such programs are freely-available, mitigating trust issues between botmasters and adversaries. A more elaborate program could even search eBay using their API [11], and automatically decode images in the search results. The hidden bits could be checksummed, to allow the program to distinguish between actual hidden information and random garbage. Once a listing is found, adversaries would use eBay like in any other auction. The successful bidder pays the botmaster using an payment system such as e-gold or PayPal, as before. Upon receipt of the payment, the botmaster would then provide the full document to the buyer. If the document is small enough, another option would be to encrypt the document and include it in the auction posting, so that only the decryption key needs to be provided to the buyer. 4 Additional Revenue Streams The auction or outright sale of documents to adversaries is one revenue stream for botmasters, but black market botnets present two more. 4.1 Bidding Wars The botmaster could start a bidding war between the original owner of the document and an adversary. The documents owner would be extorted into paying the botmaster into keeping the document private. This would work particularly well against large corporations or celebrities. However, this is likely to erode any trust the botmaster enjoys from
the adversaries. Also, the nature of this activity is overt. It is conceivable that an adversary could sell many documents from one victim under normal circumstances, but once a bidding war started, the victim would be alerted to the zombies presence and take counter-measures. 4.2 Query Persistence As presented, adversaries search queries in a black market botnet are ephemeral; if a document of interest appears only seconds after an adversary searches, the adversary will not see it. An outgrowth of the black market botnet scenario is for the botmaster to allow adversaries queries to persist, to watch for new documents matching the search criteria. There are definitely tradeoffs. For the adversary, persistence raises the risk of information exposure should the black market botnet be compromised by defenders. Such information could reveal the identity of a targeted person or company, for example, which the adversary may wish to keep secret. For the botmaster, query persistence provides another revenue stream, as a value-added service that adversaries can be charged for. Extending this idea, the botmaster could allow adversaries to inject agents into the botnet or the search interface. Adversaries could then not only have persistence, but could perform much richer searches, not limited by the botmasters provided search interface. Allowing agents might seem like the botmaster is placing a high degree of trust in adversaries, but this is not the case. A savvy botmaster would not allow arbitrary agents to run, but only safe, code-signed agents that the botmaster provides to adversaries, for a fee. It is thus not only safe, it is a business opportunity for the botmaster. The botmaster protects their investment in the black market botnet by maintaining control of agents and also the number of concurrent agents, to avoid the botnet being overloaded. The botmaster also opens up a new market with the black market botnet, selling agents to adversaries. 5. Defenses Several methods can be used to defend against black market botnets; some proactive, others reactive. Most of the defenses are not mutually exclusive, and therefore multiple defenses can (and should) be employed. Preventing Infection. Preventing computers from becoming part of a botnet, i.e., avoiding infection by
Internet worms and other malicious software, is the most effective defense. The usual suspects are helpful here: installing the latest patches, using firewalls, running up-to-date antivirus software. Limiting Document Exposure. A cautious user might attempt to hinder a black market botnet by limiting access to private documents. This could be accomplished by moving infrequently-needed documents to offline storage; documents that are saved on a DVD and stored on a bookshelf simply wont be within reach of an adversary. However, if the user does require a document and inserts the DVD into their computer, then it instantly becomes accessible and vulnerable again. While certainly not a complete defense, limiting document exposure through offline storage would act as part of a defense in depth. The implication is that a black market botnet would not know when interesting documents may become available, nor how long such documents will remain available. This further motivates the use of agents that perform persistent queries in the black market botnet, as discussed in the last section. That said, this defense takes extra planning and effort to implement, and is not likely to be practical for everyday use. A related issue is the current trend in retaining documents and other personal information for longer periods of time [4]. Government legislation mandating data retention for auditing purposes, such as the Sarbanes-Oxley Act, only provides more opportunities for a black market botnet to gain access to private documents [8]. Archived documents must therefore be handled with great care. Digital Rights Management. Another way to limit access to private documents is to use digital rights management (DRM). The idea is that, if a document is only readable on a particular zombie computer, then its not usable even if the document is stolen. A simple DRM technique, for example, would be to protect documents with a password. When combined with strong encryption, where the password is the encryption key, this scheme limits a black market botnets access to private documents, much like saving the documents on removable media. However, a similar problem also exists in that documents are immediately accessible to a botnet once decrypted. Various DRM schemes exist, and in 2003 theWorld Intellectual Property Organization published a comprehensive study on various DRM technologies [31]. In particular, implementations of DRM systems for documents already exist [17].
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) Use Steganography. Another possible defense against black market botnets would be to use steganography to hide very sensitive documents. Users could hide sensitive financial information on their computer inside a seemingly harmless image of their puppy. In this sense, all of the steganographic techniques that can be used by black market botnets to hide postings on auction sites can be applied to defend against them. The drawbacks to steganography are the limited storage capacity it offers, and extra steps required of the user to hide and retrieve their documents. And, once a document is extracted from its steganographic cocoon, it is again vulnerable to persistent black market botnet queries. Document Fingerprinting. If one assumes that it will be impossible to fully protect every private document on a computer system, then the next best defense is to find out how documents are leaking, and plug those holes after the fact. Reactive defenses may not be an ideal approach, but it is unlikely that every single black market botnet scenario can be predicted and proactively defended against. Fingerprinting is a technique where each copy of a document contains some unique modification (fingerprint) so that the document can be examined later to determine who this copy belonged to [21]. Fingerprinting research is typically aimed towards multimedia content, where content distributors attempt to prevent piracy by linking copies of the multimedia content to specific owners. A corporation could take a similar approach when releasing documents under a nondisclosure agreement. In the context of this paper, however, a corporation would need to not only fingerprint documents which are distributed to external organizations, but also documents which are distributed within the company. This way, if a document is harvested from an infected computer on the inside and later appeared in some public forum, it would be possible to trace the document back to the computer that it leaked from. Follow the Money Trail. The key motivation presented in this paper for a botmaster to gather private documents is the monetary incentive. Thus, money will be trading hands frequently in exchange for documents. If law enforcement agencies stumble across even a handful of buyers, they may be able to trace payments to their destination and catch the botmaster. More proactive law enforcement agencies may even conduct sting operations by purchasing
documents themselves for the purposes of following deposits to the botmasters bank account. Unfortunately, laundering money is a well-developed process, and it is hard to weigh the possible success of tracing money to the botmaster against the deployment of proactive defenses. Active Countermeasures. Similar to existing honeypots used to track spammers and Internet worms [14], systems can be set up to provide fake documents for a black market botnet to mine. If combined with fingerprinting, the owner of a document honeypot could gain extra insight into how black market botnets work, such as the characteristics that make a document interesting to adversaries. Given this extra knowledge, a large document honeypot with many fake documents could be established in order to decrease the signal-to-noise ratio in the auction. The good guys could even bid on the fake documents to throw off black market botnets that learn from previous sales. That said, this defense would be yet another arms race which does not address any of the underlying issues involved, and is perhaps best left to people with too much time on their hands.
6 Conclusion
In this paper we have presented a scenario where a botmaster can use existing technology to create novel markets that exploit the Long Tail, where adversaries purchase private documents stolen from victims computers. There is a clear motivation for creating these markets, both for adversaries who would like access to the documents, and for botmasters who would be able to profit from providing the access that adversaries desire. In addition to the primary revenue stream of document sales, there are additional revenue streams in document bidding wars and supplying agents for persistent queries. But this is only the beginning. Researchers studying the Long Tail phenomenon have conjectured that there are second-order effects of the Long Tail on producers and consumers [6]. The unfortunate implication is that, as the market for private documents becomes popular, new niche black markets will emerge that are currently unfathomable.
References
[1] Amazon.com. What are statistically improbable phrases? http://www.amazon.com/gp/searchinside/
sipshelp.html. [2] C. Anderson. The Long Tail: Why the Future of Business Is Selling Less of More. Hyperion, 2006. [3] J. Bates. Trojan horse: AIDS information introductory diskette version 2.0. Virus Bulletin, pages 36, Jan. 1990. [4] J.-F. Blanchette and D. G. Johnson. Data retention and the panoptic society: The social benefits of forgetfulness. Information Society, 18(1), 2002. [5] M. Bond and G. Danezis. A pact with the Devil. Technical Report UCAM-CL-TR-666, University of Cambridge Computer Laboratory, 2006. [6] E. Brynjolfsson, Y. Hu, and M. D. Smith. From niches to riches: Anatomy of the Long Tail. MIT Sloan Management Review, 47(4), 2006. [7] D. H. Chau, S. Pandit, and C. Faloutsos. Detecting fraudulent personalities in networks of online auctioneers. In Principles and Practice of Knowledge Discovery in Databases, pages 103114, 2006. [8] C. Crump. Data retention: privacy, anonymity, and accountability online. Stanford Law Review, 56(1):191229, Oct 2003. [9] D. Dagon, G. Gu, C. Zou, J. Grizzard, S. Dwivedi, W. Lee, and R. Lipton. A taxonomy of botnets. Unpublished, available at http://www.math.tulane.edu/tcsem/ botnets/ndss_botax.pdf, 2005. [10] J. Dorre, P. Gerstl, and R. Seiffert. Text mining: Finding nuggets in mountains of textual data. In Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 398 401, 1999. [11] eBay. What is the eBay API? http://developer.ebay.com/common/api, 2007. [12] J. Franklin, V. Paxson, A. Perrig, and S. Savage.An inquiry into the nature and causes of the wealth of Internet miscreants. In 14th ACM Conference on Computer and Communications Security, pages 375388, 2007. [13] O. D. Gnawali. A Keyword-Set Search System for Peer-to-Peer Networks. MIT, 2002. M.Sc. thesis. [14] The Honeynet Project. http://www.honeynet. org/. [15] D. Jackson. Gozi Trojan. SecureWorks, 2007. [16] LURHQ. Cryzip ransomware Trojan analysis, 2006.
[17] Microsoft. Microsoft Windows Rights Management Services for Windows Server 2003, 2005. [18] Panda Software. PGPCoder.A. Virus Encyclopedia,2005. [19] F. A. P. Petitcolas, R. J. Anderson, and M. G. Kuhn. Information hiding a survey. Proceedings of the IEEE, 87(7):10621078, July 1999. [20] S. Schecter and M. Smith. Access for sale: a new class of worm. In 2003 ACM Workshop on Rapid malcode, pages 1923, 2003. [21] D. Schonberg and D. Kirovski. Fingerprinting and forensic analysis of multimedia. In 12th Annual ACM International Conference on Multimedia, pages 788795, 2004. [22] Spamhaus. What is fast flux hosting? Frequently Asked Questions (FAQ). [23] Spammer-X. Inside the SPAM Cartel. Syngress, 2004. [24] K. Sparck Jones. Index term weighting. Information Storage and Retrieval, 9(11):619633, 1973. [25] The Honeynet Project. Know your enemy: Profile - automated credit card fraud. http://honeynet. org/papers/profiles/cc-fraud.pdf, 2003. [26] Trend Micro. TROJ ARHIVEUS.A. Virus Encyclopedia,2006. [27] United States v Bradley Joseph Steiger. 318 F.3d 1039. Eleventh Circuit, United States Court of Appeals,2003. [28] United States v Ronald C. Kline. 112 Fed.Appx.562. Ninth Circuit, United States Court of Appeals, 2004. [29] United States v William Adderson Jarrett. 338 F.3d 339. Fourth Circuit, United States Court of Appeals,2003. [30] D. Watson, T. Holz, and S. Mueller. Know your enemy: Phishing. http://www.honeynet.org/ papers/phishing/, 2005.
SECURE QUANTUM KEY DISTRIBUTION USING CENTRAL AUTHORITY

DEERAJ.T1, BENNY JOVER GIFT.J2 ANAND INSTITUTE OF HIGHER TECHNOLOGY, CHENNAI ABSTRACT: Quantum Key Distribution (QKD), uses quantum mechanics to guarantee secure communication. It enables two parties to produce a shared random bit string known only to them, which can be used as a key to encrypt and decrypt messages. An important and unique property of quantum cryptography is the ability of two communicating users to detect the presence of any third party trying to gain knowledge of the key. This results from a fundamental aspect of quantum mechanics: the process of measuring a quantum system in general disturbs the system. A third party trying to eavesdrop on the key must in some way measure it, thus introducing detectable anomalies. Our paper presents a even more secure key distribution scheme using a central authority in Quantum Key Distribution. INTRODUCTION: Cryptography is, traditionally, the study of ways to convert information from its normal, comprehensible form into an obscured guise, unreadable without special knowledge the practice of encryption. Cryptography includes encryption algorithms as well as key distribution algorithms. A very secure and highly efficient key distribution algorithm can balance out a less secure encryption algorithm. In the past, cryptography helped ensure secrecy in important communications, such as those of spies, military leaders, and diplomats. In recent decades, the field of cryptography has expanded its remit and has become a part of everyday life. It helps us to conduct secure e-transactions. Even common people want to keep their communications secret and secure. With the rise in cryptography, it is only inevitable that there is an equal level of increase in the field of cryptanalysis. Recently developed Quantum computers have shown the world that they are capable of solving cryptographic algorithms and key distribution algorithms which till now have been considered to be unconditionally secure. Thus there is a rise in the need for more secure cryptographic algorithms or at least secure key distribution algorithms. The answer to that need is the rise of Quantum Cryptography or Quantum Key Distribution. It is capable of thwarting the efforts of Quantum Computers. Quantum Key Distribution (QKD) was invented in 1984 by Charles Bennett and Gilles Brassard. QKD security relies on the laws of quantum mechanics, and more specifically on the fact that it is impossible to gain information about non-orthogonal quantum states without perturbing these states. This property can be used to establish a random key between two users and guarantee that the key is perfectly secret to any third party eavesdropping on the line. Essentially, quantum cryptography is based on the usage of individual particles/waves of light (photon) and their intrinsic quantum properties to develop an unbreakable cryptosystem essentially because it is impossible to measure the quantum state of any system without disturbing that system. This is known as Heisenbergs Uncertainty Principle. The existing model of Quantum Key Distribution is known as BB84 protocol. It uses two channels namely quantum channel and classical channel. The sender who wants to distribute his key will send his calculated qubits in the quantum channel. The receiver will interpret the qubits using his own basis and send the result to be compared with the original in order to find the common qubits which becomes the secret key. However, this model faces the problem of Man-in-the middle attack.
Our proposed solution is done with the view of solving the Man-in-the middle attack. This method uses a third party or a central authority in order to exchange keys securely using software implementation in Java. QUANTUM KEY DISTRIBUTION: Quantum Key Distribution is an alternative solution to the Key Establishment problem. In contrast to public-key cryptography, it has been proven to be unconditionally secure, i.e., secure against any attack, even in the future, irrespective of the computing power or any other resources that may be used. QKD security relies on the laws of quantum mechanics.
classical and quantum mechanics is that all objects obey laws of quantum mechanics, and classical mechanics is just a quantum mechanics of large systems. In contrast to classical physics, the act of measurement is an integral part of quantum mechanics. In general, measuring an unknown quantum state will change that state in some way. This is known as quantum indeterminacy, and underlies results such as the Heisenberg uncertainty principle, information-disturbance theorem and no cloning theorem. This can be exploited in order to detect any eavesdropping on communication and, more importantly, to calculate the amount of information that has been intercepted.
QUBITS: QUANTUM MECHANICS A quantum bit or qubit is a unit of quantum information. It is the quantum analogue of the classical bit. It is described by a state vector in a two-level quantummechanical system, which is formally equivalent to a two-dimensional vector space over the complex numbers. A qubit can have two possible valuesnormally a 0 or a 1. The difference is that whereas a bit must be either 0 or 1, a qubit can be 0, 1, or a superposition of both. The states a qubit may be measured in are known as basis states (or vectors). As is the tradition with any sort of quantum states, Dirac, or bra-ket notation is used to represent them. This means that the two computational
Quantum mechanics is a mathematical theory that can describe the behavior of objects that are roughly 10,000,000,000 times smaller than a typical human being. Quantum particles move from one point to another as if they are waves. However, at a detector they always appear as discrete lumps of matter. There is no counterpart to this behavior in the world that we perceive with our own senses. One cannot rely on every-day experience to form some kind of "intuition" of how these objects move. The intuition or "understanding" formed by the study of basic elements of quantum mechanics is essential to grasp the basis states are conventionally written as behavior of more complicated quantum and . systems. Predictions of quantum mechanics have been verified experimentally to a very high degree of accuracy. Thus, the current logic of correspondence principle between
single photon in the state specified to Bob, using the quantum channel. Basis 0 1 As Bob does not know the basis the photons were encoded in, all he can do is select a basis at random to measure in, either rectilinear or diagonal. He does this for each photon he receives, recording the time, measurement basis used and measurement result. After Bob has measured all the photons, he communicates with Alice over the public classical channel. Alice broadcasts the basis each photon was sent in, and Bob the basis each was measured in. They both discard photon measurements (bits) where Bob used a different basis, which will be half on average, leaving half the bits as a shared key.
BASIS: In linear algebra, a basis is a set of vectors that, in a linear combination, can represent every vector in a given vector space or free module, and such that no element of the set can be represented as a linear combination of the others. In other words, a basis is a linearly independent spanning set. Pairs of orthogonal states are referred to as a basis. The usual polarization state pairs used are either the rectilinear basis of vertical (0) & horizontal (90) and the diagonal basis of 45 & 135. EXISTING PROTOCOL:
Alice's random bit
1 0
Alice's The existing protocol was invented by random Charles H. Bennett and Gilles Brassard (1984). Any two pairs of conjugate states can be used sending basis for the protocol. The sender (traditionally referred to as Alice) and the receiver (Bob) are Photon connected by a quantum communication channel which allows quantum states to be polarization Alice sends transmitted. The protocol is designed with the assumption that an eavesdropper (referred to as Eve) can interfere in any way. The rectilinear and diagonal bases are used. Bob's random measuring basis The first step in BB84 is quantum transmission. Alice creates a random bit (0 or 1) and then randomly selects one of her two bases (rectilinear or diagonal) to transmit it in. Photon She then prepares a photon polarization state polarization depending both on the bit value and basis, as Bob measures shown in the table. Alice then transmits a PUBLIC DISCUSSION OF BASIS Organized by: Department of Computer Science and Engineering, Anand Institute of Higher Technology, Chennai. www.iconic12.in E-mail: iconic.aiht@gmail.com Page 542
Shared secret key
To check for the presence of eavesdropping Alice and Bob now compare a certain subset of their remaining bit strings. If a third party (usually referred to as Eve, for 'eavesdropper') has gained any information about the photons' polarization, this will have introduced errors in Bobs' measurements. If more than p bits differ they abort the key and try again, possibly with a different quantum channel, as the security of the key cannot be guaranteed. p is chosen so that if the number of bits known to Eve is less than this, privacy amplification can be used to reduce Eve's knowledge of the key to an arbitrarily small amount, by reducing the length of the key.
2. PHOTON SPLITTING ATTACK Eve can split off the extra photons and transmit the remaining single photon to Bob and Eve stores these extra photons in a quantum memory until Bob detects the remaining single photon and Alice reveals the encoding basis. Eve can then measure her photons in the correct basis and obtain information on the key without introducing detectable errors.
3. HACKING ATTACKS Hacking attacks target imperfections in the implementation of the protocol instead of the protocol directly. If the equipment used in quantum cryptography can be tampered with, it could be made to generate keys that were not secure using a random number generator attack
ATTACKS: Even though the Quantum Key Distribution protocol, BB84 is quite secure compared to the other existing key distribution protocols, it has its own disadvantages. They are: 1.MANINTHE-MIDDLE ATTACK The man-in-the-middle attack or bucket-brigade attack is a form of active eavesdropping in which the attacker makes independent connections with the victims and relays messages between them, making them believe that they are talking directly to each other over a private connection when in fact the entire conversation is controlled by the attacker. The attacker must be able to intercept all messages going between the two victims and inject new ones, which is straightforward in many circumstances. A man-in-the-middle attack can only be successful when the attacker can impersonate each endpoint to the satisfaction of the other. Most cryptographic protocols include some form of endpoint authentication specifically to prevent MITM attacks. 4. DENIAL OF SERVICE A denial-of-service attack (DoS attack) or distributed denial-of-service attack (DDoS attack) is an attempt to make a computer resource unavailable to its intended users. One common method of attack involves saturating the target (victim) machine with external communications requests, such that it cannot respond to legitimate traffic, or responds so slowly as to be rendered effectively unavailable. Because currently a dedicated fibre optic line (or line of sight in free space) is required between the two points linked by quantum cryptography, a denial of service attack can be mounted by simply cutting or blocking the line or, perhaps more surreptitiously, by attempting to tap it.
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) CONCLUSION: Quantum cryptography and especially Quantum Key Distribution (QKD) has triggered intense and prolific research works during the past twenty years and now progresses to maturity. QKD enables Secret Key Establishment between two users, using a combination of a classical channel and a quantum channel. The essential interest of QKD, that is intrinsically linked to the quantumness of the signals exchanged on the quantum channel, is that any eavesdropping, on the line can be detected. This property leads to cryptographic properties that cannot be obtained by classical techniques; this property allows to perform Key Establishment with an extremely high security standard which is known as unconditional or information-theoretic security.
PROPOSED SOLUTION:
A man-in-the-middle attack can only be successful when the attacker can impersonate each endpoint to the satisfaction of the other. Hence it is required that the users authenticate themselves to each other. Our proposed solution is to provide this Our paper is to perform the software authentication using a central authority. implementation of Quantum Key Distribution in a more secure manner using a third party or All the users of Quantum Key central Authority in order to provide Distribution must register with the central authentication and solve the Man in the middle authority. Consider two users Alice and Bob. attack. Alice will obtain an identification number IDA, along with the bit positions and basis REFERENCES: from the Central Authority (CA) when she 1. Applied Cryptography by Bruce registers. When Alice and Bob wish to Schneier communicate, they will inform the CA. The 2. http://www.secoqc.net CA will generate the random bits and insert 3. http://www.searchsecurity.techtarget.c each bit of IDA or IDB one-by-one at the om specified bit positions. Then it will apply the 4. www.sciencedaily.com pre-selected basis to the ID bits while 5. http://www.csa.com/discoveryguides/ randomly applying the same to the other bits. crypt/overview.php The generated result is sent via the quantum channel to both Alice and Bob. They will randomly apply basis to the received result in order to get the original bits as well as to verify their IDs. They will send the order of basis applied by them to the CA. The CA will then compare the two sets of basis in order to find the common ones and transmit the same to both the users. This forms the secret key for communication between Alice and Bob. The proposed solution has its own advantages. The main advantage is that it prevents Man-in-themiddle attack. It provides authentication and makes sure the communication is more secure and between the intended persons only.
Logic Macroprogramming for Wireless Sensor Networks

N.K Senthil Kumar1, S. Kavitha2 Computer science department, Vel Tech Dr. RR & Dr.SR Technical University, Chennai 3 Electrical and Electronics department, Vel Tech Dr. RR & Dr.SR Technical University, Chennai
1
ABSTRACT It is notoriously difficult and tedious to program wireless sensor networks (WSNs). To simplify WSN programming, we propose Sense2P, a logic macroprogramming system for abstracting, programming, and using WSNs as globally deductive databases. Unlike macroprograms in previous works, our logic macroprograms can be described declaratively and imperatively. In Sense2P, logic macroprogrammers can easily express a recursive program or query that is unsupported in existing database abstractions for WSNs. We have evaluated Sense2P analytically and experimentally. Our evaluation result indicates that Sense2P successfully realizes the logic macroprogramming concept while consuming minimal energy as well as maintaining completeness and soundness of the answers. for routing, coordinating the program flow among nodes, accessing, and managing remote data. 1. Introduction Wireless sensor networks (WSNs) have been widely used for collecting data from environments [14]. However, sensor nodes are resource constrained and distributed all over the monitored area. Programming WSNs to acquire such data is notoriously difficult and tedious. Traditional WSN programming requires system programming in low-level details (e.g., wiring nesC [5] components, coordinating the program flow among nodes in a distributed manner, routing, discovering resources, accessing, and managing remote data) while maintaining low energy consumption and memory usage [6]. Several programming abstractions have been proposed to simplify WSN programming with highlevel languages and to hide the low-level details from programmers [612]. The WSN programming abstractions have been divided into two classes: local-behavior class and global-behavior class (also called macroprogramming class). The abstraction in the former class simplifies the programming task of specifying the local behavior of each node for distributed computation. Local-behavior abstractions include abstract regions [11, 12] and DSN [6]. These local-behavior abstractions can efficiently hide some of the above low-level programming details but the programmers still need to write a distributed code Conversely, the abstraction of the macroprogramming class enables expressing the global behavior of the distributed computation by programming the WSN in the large [7]. These macroprogramming abstractions can hide even more low-level programming details than the localbehavior abstractions do. In a sense, macroprogrammers take a centralized view of programming a distributed system rather than a distributed view. The macro compiler is responsible for translating the macroprogram into a distributed version for execution. There are two subclasses of macroprogramming abstractions: node dependent and node independent. In the node dependent subclass, a WSN is abstracted as a collection of nodes that can be simultaneously tasked within a single program. Examples of the node-dependent subclass include Kairos [7], Regiment [10], Split-C [13], SP [14], and DRN [8]. By contrast, in the node-independent subclass, a WSN is abstracted and programmed as a whole or a unit instead of several interacting nodes. Low-level programming details are completely abstracted out in this subclass as there are no longer networks or nodes in the programmers view. Examples of this subclass include TinyDB [9] and Cougar
Both have abstracted WSNs as relational databases that are programmed or queried in a SQL-like language. This abstraction is reasonable because WSNs have also been queried for data in relation [16]. Given this database abstraction, WSN programming is reduced to database querying. However, SQL is a pure declarative programming language for specifying what the programmer wants, not how to algorithmically obtain the desired result. Despite its simplicity, declarative programming may not be applicable to several WSN applications, especially complex tasks or queries.
systems. Another advantage of logic macroprogramming is its capability to easily express a recursive program. Even though one can express a recursive query in SQL, the recursive SQL query is rather verbose (see appendix A) and unsupported in existing systems for WSNs. Our evaluation result indicates that Sense2P can realize the logic macroprogramming concept while consuming min-imal energy and maintaining completeness and soundness of the answers. 2. Related Work Various macroprogramming abstractions have been pro-posed for several years. However, no abstraction fits all domains. We discuss the dierences of our abstraction from those existing ones in this section.
In this paper, we propose Sense2P, a logic macroprogramming system for abstracting and programming WSNs as globally deductive databases. Unlike macroprograms in previous works, our logic macroprograms can be described declaratively and imperatively. As a result, Sense2P is highly expressive and ecient, compared to SQL-based
TABLE 1: Characteristics comparison. Approac h Programming model Characteristic Node dependency Communication transparency Imperative (procedural Network level Kairos programming) (global) Node dependent Yes No Regimen Declarative (functional Network level t programming) (global) Node dependent Yes Yes Declarative and imperative Network level DRN (procedural programming with (global) Node dependent Yes No resource variable) Network level Node Cougar Declarative (SQL) (global) independent Yes No Network level Node TinyDB Declarative (SQL) (global) independent Yes No Semantic Declarative (logic Network level Node programming) (global) independent Yes No Stream Declarative and imperative (logic Snlog Node level (local) Node dependent No Yes programming) Declarative and imperative (logic Network level Node Sense2P (global) independent Yes Yes programming) programmed. Similar to DRN, Sense2P is also a hybrid approach, given that logic programming is an Of a particular interest are Kairos [7], Regiment integration of imperative programming and [10], and DRN [8]. Kairos presents the programming declarative programming. Kairos, Regiment, and model that computes a set of sensor devices in DRN are node dependent but Sense2P is node parallel and provides a facility to sequentially access independent. remote variables. Unlike Kairos, Regiment is the spatiotemporal macroprogramming system that is based on the concept of functional reactive programming. However, Regiment is designed for First, supported queries in the previous works are long-running queries (not well-suited for short-lived quite limited. For example, there is only one table queries). accessible at a time. This may not work in networks of heterogeneous sensors. In other words, their DRN is a hybrid approach between imperative queries do not support a join between dierent programming and declarative programming. sensor nodes. In addition, only conjunctive Resources and nodes are declaratively named comparison predicates are supported, and arithmetic whereas the core algorithm is imperatively expres-sions are limited to operations of an attribute Organized by: Department of Computer Science and Engineering, Anand Institute of Higher Technology, Chennai. www.iconic12.in E-mail: iconic.aiht@gmail.com Page 547 Recursive query
Abstraction level
and a constant. As a result, tuple selection is inflexible. Furthermore, sub-queries and column
aliases are not allowed either.
Our system allows pro-grammers to write recursive and nonrecursive rules (pro-grams) without being concerned with low-level program-ming details. Additionally, Sense2P is suciently simple for application-level users who only want to query the system for interested data. Our programming model and system architecture are described as follows. 5. Programming Model Briefly, our programming language in Sense2P is Prolog like. The language consists of predicates, facts, rules, and queries 5.1. Predicate. Predicates are relations of data (or tables in the relationaldatabase terminology). For example, a pred-icate temperature (NodeID,. From this example, temperature is a predicate name while NodeID and Temperature Value arevariable arguments. In general, an argument is a variable if it begins with the capital letter. Conversely, an argument is a constant if it begins with the small letter or it is a number. the system. E. Fact. A fact is a predicate whose arguments are all constant. One may consider facts as alreadyexisting data in the system. Facts can be instantiated in three forms: user defined, sensor generated, and rule deduced. Users can define known facts in the program such as location (1, 33, 45). This fact indicates that a node ID 1 is located at coordinate (33, 45). Some facts are data-sensed from sensors. F. Rule. Rules are clauses that deduce new facts from existing facts. Rules are represented as Horn clauses that contain head and body parts. An example of a rule is shown in Listing 1. Specifically, an area has a hot spot if an arbitrary node in that area senses temperature with a value over 50 degrees. The left-hand side of a clause is called head and the right-hand side is called body. In Listing 1, the head is hotSpotArea(AreaID) and the body is temperature(NodeID, Temp), Temp > 50, area(NodeID, AreaID). A rule will be satisfied only if every predicates in the body are satisfied or matched by at least one fact. 5.4. Query. A query is represented by ?-followed by a predicate. For example, ?-hasHotSpotArea(X) is a query to retrieve IDs of all areas that has a hot sensor node.
Second, each sensed data item is kept as a tuple associated with each node. Constraints in the query are applied only to attributes in the same tuple as well as the same node Therefore, the constraints are local, not global. It is not designed for deriving data that is related with other data from different nodes. As a result, they cannot support a join operation. Third, they do not support recursive queries. It is well documented that the recursive queries can improve the capability of a database [19, 20]. Finally, previous systems with a relationaldatabase abstraction do not support a logic-based query frequently used in deductive databases and expert systems. Unlike TinyDB and Cougar, our approach abstracts a WSN as a globally deductive database that can be logically programmed. As a result, our approach does not suer from the above limitations. 3. Logic Macroprogramming Logic programming is a logic-based declarative approach to knowledge representation that allows recursive program-ming. Logic programming is widely used in many artificial-intelligence applications such as knowledge-based systems, expert systems, and smart information-management sys-tems, and so forth. Prolog [17] is a de facto language for logic programming in traditional systems.. Our early work in abstracting WSNs as deductive databases has been presented in [22]. 4. Sense2P Sense2P is our prototype for logic macroprogramming WSNs in a Prolog-like language.
Queries can be classified into 4 groups. The first group is a fact-checking query that is intended for checking the existences of certain facts. for example, ?-temperature(2, 5). The second group includes factretrieving queries that are designed for retrieving all data that satisfy the fact types and constraints in the queries. This query type contains at least one variable in a predicate, for example, ?temperature(X, Y). The final group is composed of deductive queries for retrieving all data that satisfy.
sensor nodes in danger as 6. System Architecture Sense2P consists of two major components: the query processing engine and the data-gathering engine (Figure 1). The query processing engine resides on the base station while the data-gathering engine resides on each wireless sensor node. 6.1. Query Processing Engine. The query processing engine is crucial for logic macroprogramming WSNs. The main tasks are to interpret a user program (consisting of facts, rules, and queries) and to process queries to find satisfying answers. Sense2P query processing engine consists of three main components: a compiler unit, a run-time processing unit, and a network interface unit. The compiler unit parses a logic macroprogram into a compiled code that runs on the run-time processing unit. The run-time processing unit is required to treat sensor-specific facts dierently from those of other ordinary facts. Sensor-specific facts are data Query processing Compiler unit Run-time processing unit Network interface unit Subquery processing layer
hotSpotArea(AreaID):- temperature(NodeID, T) G. T > 50 H. area(NodeID, AreaID) LISTING 1: Example of a rule.
danger(AreaID):- temperature(NodeID,T) E. T > 80 area(NodeID, AreaID). danger(AreaID):- humidity(NodeID, H) , H < 40 , area(NodeID, AreaID) , adjacent(AreaID, AdjAreaID)
Data gathering , winddir(AdjAreaID, AreaID) , danger(AdjAreaID). Query: ?-danger(X).
Routing layer Link layer Physical layer
LISTING 2: Example of recursive query rulein the queries, for example, ?hotSpotArea(X). A query will be recursive if its predicate matches with a recursive rule whose body contains the same predicate name as that in its head. For example, one can write recursive rules for detecting
Base station Gateway node
Wire communication Wireless communication
LISTING 3: Example of a rule that two predicates related to each other with Obj. Wireless sensor node FIGURE 1: System Architecture. However, it is not easy for a node to selectively send relevant data without knowing priori what all other nodes have. A fact in a node may be relevant simply because another fact from another node happens to have a certain value. Conversely, the top-down approach can use information from a query to suppress irrelevant facts from being sent. For example, a predicate detect(ObjectID, AreaID) in a system means a sensor node can detect an object with the identifi-cation number ObjectID in the region AreaID. When a user injects a query ?-detect(oiltank,X), only sensor nodes that can detect an object named oiltank will send answers back. Other nodes are suppressed.
locally sensed and stored by sensor nodes in the network. Processing queries related to these facts requires special attention because unnecessary data transmissions are costly in WSNs. In this paper, we consider three previously proposed schemes for query processing in deductive databases. These schemes include top-down, bottomup, and Prolog-style evaluation approaches [21]. Prolog-style systems (coupled with database systems) are similar to the top-down systems in a sense that their exe-cution starts from the goal and a query can be solved by executing each subgoal until the deduced facts match the goal. However, Prologstyle systems produce answers one tuple at a time whereas top-down methods produce one set at a time without in-order execution of subgoals. Conversely, the bottom-up methods start from existing facts and attempt to deduce new facts from rules that are related to the query. Only facts that match with the goal of the query are selected as the answers. We refer to [21] for more information of each implementation scheme. Many works suggest that the bottom-up methods have many advantages over top-down methods in traditional deductive database systems [20, 21, 23]. However, in wireless sensor networks, we argue that top-down and Prolog-style approaches are more appropriate.
In addition, we can use an answer set from the previous subgoal to filter out (or suppress) the irrelevant facts of the next subgoal. For example, consider the rule in Listing 3. When a user injects a query ?-hotObject(X, area70), the system will match the query with the above rule. Therefore, the variable AreaID in the rule will be bound with the constant area70. Then, the system will attempt to match each predicate in the body of the rule. Each body predicate becomes a subquery that needs to be satisfied. In this example, the first subquery is detect(ObjectID, area70). This subquery is disseminated into the network. Only eligible repliers are nodes with facts or Furthermore, we can also use a constraint Temp > 50 as another filter before injecting a subquery temperature(ObjectID, Temp) to the network. Due to these filtering techniques, this top-down ap-proach can significantly reduce the consumption of energy that is limited in wireless sensor networks [16]. Therefore, the Prolog-style top-down approach is used and combined with our filtering techniques in this paper.
hotObject(Obj, AreaID)
AreaID):-
detect(Obj,
,temperature(Obj, T) ,T > 50.
Macroprogram /query Compiler unit Compiled program/query Answer(s) Run-time processing unit Subque ry answer (s) Network unit Sensor specific subquer y if interface
Wireless sensor network (datagathering engine)
FIGURE 2: Query processing flow.
subquery is checking existence then if have local satisfied fact then send answer up to parent; else forward query children; end to
In our system, most relevant facts are pulled from the network except the persistent ones that do not change overtime.
Finally, the network interface unit is responsible for disseminating queries or subqueries into the network. The queries are transformed into a format known in sensor networks, serialized, and sent into the network. The unit is also responsible for receiving answers from the network. This requires deserialization and transformation of messages back into the Prolog-like predicates. 6.2. Data-Gathering Engine. Data-gathering engine is re-sponsible for finding answers that are relevant to injected queries. This engine consists of the routing layer, the query-processing layer, the link layer, and the physical layer. However, our work simply focuses on the routing layer and the query processing layer. Both mentioned layers are handled by our LogicQ sub-system.
else if subquery is asking all Satisfied value then forward query to children; if have local satisfied fact then send answer up to parent;
ALGORITHM algorithm.
1: Subquery processing
If the query type requires all satisfied answers (Line 6), a sensor node will forward the query immediately (Line 7). Regardless of the local existence of the satisfying facts, the system still needs satisfying answers from all sensor nodes. After the query is forwarded, the node checks for local satisfying answers. If it has one, it will send the
answer up to its parent (Line 8-9) 7. Conclusion This paper proposes a logic node-independent macropro-gramming approach for abstracting, programming, and using WSNs as globally deductive databases. Unlike macro-programs in previous works, our logic macroprograms can be described declaratively and imperatively. To eciently process queries and their subqueries (either recursive or nonrecursive), the top-down approach is more appropriate than the bottom-up approach. This is due to WITH RECURSIVE ancestor(anc, desc) AS ( ( SELECT par AS anc, child AS desc FROM parent ) UNION ( SELECT ancestor.anc, parent.child AS desc FROM ancestor, parent
sensor networks, in Proceedings of the 10th Annual International Conference on Mobile Comput-ing and Networking (MobiCom 04), pp. 129143, ACM, New York, NY, USA, 2004. [2] T. He, S. Krishnamurthy, J. A. Stankovic et al., Energy-ecient surveillance system using wireless sensor networks, in Proceedings of the 2nd International Conference on Mobile Systems, Applications, and Services (MobiSys 04), pp. 270283, ACM, New York, NY, USA, 2004. [3] A. Mainwaring, D. Culler, J. Polastre, R. Szewczyk, and J. Anderson, Wireless sensor networks for habitat monitoring, in Proceedings of the 1st ACM International Workshop on Wireless Sensor Networks and Applications (WSNA 02), pp. 88 97, ACM, New York, NY, USA, 2002. [4] N. Xu, S. Rangwala, K. K. Chintalapudi et al., A wireless sensor network for structural monitoring, in Proceedings of the 2nd International Conference on Embedded Networked Sensor Systems (SenSys 04), pp. 13 24, ACM, New York, NY, USA, November 2004. [5] D. Gay, P. Levis, E. Brewer, R. Von Behren, M. Welsh, and D. Culler, The nesC language: a holistic approach to networked embedded systems, in Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 03), pp. 111, ACM, New York, NY, USA, June 2003. [6] D. Chu, L. Popa, A. Tavakoli et al., The design and implementation of a declarative sensor network system, in Proceedings of the 5th International Conference on Embedded Networked Sensor Systems (SenSys 07), pp. 175188, ACM, New York, NY, USA, 2007. [7] R. Gummadi, N. Kothari, R. Govindan, and T. Millstein, Kairos: a macro-programming system for wireless sensor networks, in
WHERE ancestor.desc = parent.par ) ) SELECT anc FROM ancestor WHERE desc=John LISTING 4: SQL programming to solve Ancestors problem and List-ing
ancestor(anc, desc):- parent(anc, desc). ancestor(anc, desc):- parent(anc, X), ancestor(X, desc). ?-ancestor(anc, John). LISTING 5: Logic programming to solve Ancestors problem. Finally, our evaluation results indicate that Sense2P can significantly reduce energy consumption References [1] C. Gui and P. Mohapatra, Power conservation and quality of surveillance in target tracking
Proceedings of the 20th ACM Symposium on Operating Systems Principles (SOSP 05), pp. 12, ACM, New York, NY, USA, 2005. [8] C. Intanagonwiwat, R. K. Gupta, and A. Vahdat, Declarative resource naming for macroprogramming wireless networks of embedded systems, in ALGOSENSORS, Lecture Notes in Computer Science, vol. 4240, pp. 192199, Springer, 2006. [9] S. R. Madden, M. J. Franklin, J. M. Hellerstein, and W. Hong, TinyDB: an acquisitional query processing system for sensor networks, ACM Transactions on Database Systems, vol. 30, no. 1, pp. 122173, 2005. [10]R. Newton, G. Morrisett, and M. Welsh, The regiment macroprogramming system, in Proceedings of the 6th Interna-tional Conference on Information Processing in Sensor Networks (IPSN 07), pp. 489498, ACM, New York, NY, USA, 2007. [11]M. Welsh and G. Mainland, Programming sensor networks using abstract regions, in Proceedings of the 1st Conference on Symposium on Networked Systems Design and Implementation (NSDI 04), p. 3, USENIX Association, Berkeley, Calif, USA, 2004. [12]K. Whitehouse, C. Sharp, E. Brewer, and D. Culler, Hood: a neighborhood abstraction for sensor networks, in Proceedings ACM, New York, NY, USA, 2004. [13]A. Krishnamurthy, D. E. Culler, A. Dusseau et al., Parallel programming in split-C, in Proceedings of the ACM/IEEE Conference on Supercomputing (Supercomputing 93), pp. 262 273, ACM, New York, NY, USA, November 1993. [14]C. Borcea, C. Intanagonwiwat, P. Kang, U. Kremer, and L. Iftode, Spatial programming using smart messages: design and implementation, in Proceedings of the 24th International Conference on Distributed Computing Systems (ICDCS 04), pp. 690399,
IEEE Computer Society, Washington, DC, USA, 2004. [15]Y. Yao and J. Gehrke, The cougar approach to in-network query processing in sensor networks, SIGMOD Record, vol. 31, no. 3, pp. 918, 2002. [16]C. Intanagonwiwat, R. Govindan, and D. Estrin, Directed diusion: a scalable and robust communication paradigm for sensor networks, in Proceedings of the 6th Annual International Conference on Mobile Computing and Networking (MobiCom 00), pp. 5667, ACM, New York, NY, USA, 2000. [17]I. Bratko, Prolog Programming for Artificial Intelligence, Addison-Wesley Longman Publishing, Boston, Mass, USA, 1986. [18]K. Whitehouse, F. Zhao, and J. Liu, Semantic Streams: a framework for composable semantic interpretation of sensor data, in Wireless Sensor Networks, vol. 3868 of Lecture Notes in Computer Science, pp. 520, Springer, 2006. [19]F. Bancilhon and R. Ramakrishnan, An amateurs intro-duction to recursive query processing strategies, SIGMOD Record, vol. 15, no. 2, pp. 1652, 1986. [20]Y. K. Hinz, Datalog bottom-up is the trend in the deductive database evaluation strategy, Tech. Rep. INSS 690, University of Maryland, 2002. [21]K. Ramamohanarao and J. Harland, An introduction to deductive database languages and systems, The VLDB Journal, vol. 3, no. 2, pp. 107122, 1994. [22]S. Choochaisri and C. Intanagonwiwat, A system for using wireless sensor networks as globally deductive databases, in Proceedings of the IEEE International Conference on Wireless & Mobile Computing, Networking & Communication (WIMOB 08), pp. 649654, IEEE Computer Society,
Washington, DC, USA, 2008. [23]R. Ramakrishnan and S. Sudarshan, Top-down vs. bottom-up revisited, in Proceedings of the International Logic Program-ming Symposium, pp. 321336, MIT Press, 1991. [24]P. Levis, S. Madden, J. Polastre et al., Tinyos: an operating system for sensor networks, in Ambient Intelligence, Springer, 2004. [25]P. Levis, N. Lee, M. Welsh, and D. Culler, Tossim: accurate and scalable simulation of entire tinyos
A Comprehensive Model to achieve Service Reusability for Multi level stakeholders using Non- Functional attributes of Service Oriented Architecture
Shanmugasundaram .G #1, V. Prasanna Venkatesan#2, C.Punitha Devi *3 # Department of Banking Technology,* Department of Computer Science & Engg., Pondicherry University Abstract SOA is a prominent paradigm for accomplishing reuse of services. Service reusability is one dominant factor which has a greater influence on achieving quality in SOA systems. There exists sufficient research in this area and researchers have contributed many works towards achieving quality in SOA systems but much emphasis was not provided on service reusability [1] [2] [3]. Few authors have addressed reusability factor with limited nonfunctional attributes. Our study focuses on identifying the non-functional attributes which have major or greater influence towards obtaining reusability in SOA systems. The objective of this study goes into the next level, to categorize the non-functional attributes on multi stakeholders perspective i.e. Service Consumer, Service Provider and Service Developer which paves the way to build a comprehensive quality model for achieving Service Reusability.
INTRODUCTION SOA acts as the major platform for building distributed applications which cross organizational boundaries because of its flexible, heterogeneous and loosely coupled nature. SOA-based businesses applications can span several networked enterprises, with services that encapsulate and externalize various corporate applications and data collections. Popularity of SOA applications not only based on its functionalities also delivers in high quality. On defining quality attributes for SOA, Service reusability stands as one of the essential factor. Reusability was the core of SOA and has helped in gaining its popularity. Service reusability is the key determinant factor for identification of optimally granular services since it proves its role in saving cost of development, and maintenance. Achieving reusability is not a simple task as it has influence on different stakeholders. Addressing of this feature leads to two main issues or questions What are all the attributes (functional and nonfunctional) having greater impact on reusability principle? Are they any measures or model is available to achieve it completely. Our objective focuses on identifying the non-functional attributes that has high impact on reusability. Later the identified attributes need to be categorized for different stakeholders. Hence this work gives a complete picture of reusability factors or attributes and their categorization on stakeholders perspective. The rest of paper is organized as follows; section 2 gives the review of related work on SOA quality and Service Reusability. Section 3 elaborates outcome of the reviews listed in section 2. Section 4 delivers our comprehensive quality model. Finally section 5 gives conclusions and future directions towards service reusability.
XII.
RELATED WORKS Quality attributes
Quality attributes are essential in choosing and designing an architecture style of any system. In SOA, quality attributes inherently affect the business goal as defined quality attributes have a greater impact on
business decisions. The survey initially starts with an objective to list works in qualities of SOA and second part of the review covers the works related to service reusability factor of SOA. Defining the quality attributes for SOA raises these questions 1. What are the quality attributes that have an impact on business goal of SOA systems? 2. Which category does the quality attributes fits in? 3. Are there any measure or evaluation mechanism to check whether the attributes are properly addressed? The related work has been reviewed considering the above questions Survey about quality attributes [Balfagih and Hassan 2009] examined the various quality attributes of SOA and Web Services and classified them into different perspectives i.e. developer, provider and consumer. [Glaster et al.] Identified the critical importance and the difficulties associated with handling Non-Functional parameters in general and the fact that they are even more difficult to address in the SOA context. They made an attempt to generate a checklist of NFPs for SOA to be used by the service providers. [Choi et al. 2008] have identified some of the unique features of SOA and then derived six quality attributes and proposed the corresponding metrics to measure each quality attribute. [Liam OBrien Lero et al. 2007] have discussed the SOA aspects related to various quality attributes
service cohesion, coupling and granularity. Few authors have listed reusability as the key quality attribute to Service Developer. [Si Won Choi and Soo Dong Kim, 2008] proposed a comprehensive quality model for evaluating reusability with functional attributes of modularity and commonality and non -functional attributes of discoverability and availability. [Renuka sindhgatta, et al., 2009] addressed the functional attributes like coupling and cohesion and nonfunctional attributes like composability and reusability, dervied metrics for coupling and cohesion and also for composability and reusability [Zain Balfagih and Mohd Fadzil, 2009] listed the qualities based on the different stakeholders, they addresssed reusability as a major quality for service developer. [George Feuerlicht 2011] stated that service granularity has impact on service reusability. Perepletchikov,et al., 2007 discusses about service coupling and cohesion which support reusability factor. They have defined metrics for the two functional attributes. [Mikhail Perepletchikov, et al., 2010] gives impact of service cohesion and coupling on reusability [Saad Alahmari, et al., 2011] defined metrics for service granularity to achieve reusability. From this review we could conclude that NFA relating to service reusability are not addressed completely and which has motivated us to carry out the problem. OUR CONTRIBUTION
Review towards quality attributes for service reusability The review listed below describes the existing works on service reusability. Here the review indicates that contributions and work of researchers is towards defining the functional and non-functional attributes along with the metrics for measuring reusability. Some have addressed specifically the functional attributes like
XIII.
The objective of our proposed is to identify the nonfunctional attributes of Service Reusability and to categorize it to multilevel stakeholders. Reusability in SOA cannot be addressed as a feature that satisfies some properties nor is it a separate entity. The Attributes of
reusability has direct or indirect impact. To ensure service reusability we need to address all the NFA which have positive or negative impact. To achieve complete reusability in SOA systems we need to define the factors for different stakeholders perspectives. From the related work the attributes that have influence on reusability could be identified. For example if discovery of services is easier then the reusability of services will be more hence service discovery shows positive impact. Likewise other NFAs could also be related TABLE I LIST OF NFAS RELATES TO SERVICE REUSABILITY 9. 10. Flexibility 6. Availability
,2009] [Si Won Choi and Soo Dong Kim, 2009] [T. Erl. 2006] [T. Erl. 2009] [Si Won Choi, Jin Sun Her, and Soo Dong Kim 2007] [Si Won Choi, Jin Sun Her, and Soo Dong Kim 2007] [Si Won Choi, Jin Sun Her, and Soo Dong Kim 2007] [Bingu Shim, et al., 2008]
7.
Adaptability
8.
Reliability
S. NonNo. Functional Attributes (NFAs) 1. Usability
Researchers
[Zain Balfagih and Mohd Fadzil Hassan ,2009][ Si Won Choi, Jin Sun Her, and Soo Dong Kim 2007] [Bingu Shim, et al., 2008] [Zain Balfagih and Mohd Fadzil Hassan ,2009] [D. J. Artus, 2006] [Zain Balfagih and Mohd Fadzil Hassan ,2009] [Si Won Choi, Jin Sun Her, and Soo Dong Kim 2008] [Liam OBrien, et al., 2007] [Zain Balfagih and Mohd Fadzil Hassan
Discoverability [Si Won Choi and Soo Dong Kim, 2009] [Si Won Choi, Jin Sun Her, and Soo Dong Kim 2007] [Zain Balfagih and Mohd Fadzil Hassan ,2009] [T. Erl. 2006] [T. Erl. 2009] Modifiability [Zain Balfagih and Mohd Fadzil Hassan ,2009] [Zain Balfagih and Mohd Fadzil Hassan ,2009]
12.
Effectiveness
13.
Security
3.
Conformance
4. 5.
Testability Composability
The second objective is to categorize the non-functional attributes and fit it into the various stakeholders. The stakeholders are in different forms, service provider, service consumer and service developer. Service developer, are one who originally develop or create the service, provider are the parties or organization who offers the service for consumption and finally service consumer are the party or enterprise who is to consume the services for their enterprise application / for developing new applications.
XIV.
COMPREHENSIVE MODEL FOR REUSABILITY OF SOA FOR MULTI-STAKEHOLDERS USING NFA
The table 1 shows complete list of non-functional attributes. Based on the different researchers contribution we can categorize the NFA for different stakeholders. The NFAs listed in Service developer can be in service consumer and also in service provider, similarly for other categories. Some of the attributes are common which falls in all categories that can have greater impact of reusability when compared to other attributes. Let us consider the attribute service discovery, it falls both in service consumer and in service developer, here the attribute is common for two category but the features ensuring discovery at consumers perspective would be different from that of developers perspective. The comprehensive model (figure 1) gives clear picture to reusability factors completely addressed in SOA systems. To estimate or evaluate reusability of SOA systems completely focus has to be at levels
Figure 2 Comprehensive Model of Reusability based NFA for different stakeholders
DISCUSSION The table below represents the various contributors towards the multi stakeholder perspective for service reusability using non-functional attributes. The notation * represents the indirect support and ^ represents the direct support. Most of the contributors address the influence of different stakeholders indirectly. They have
XV.
not categorized the NFA for each stakeholder. The comparison of different works clearly states that NFA for Service reusability of various stakeholders was not addressed precisely. Our work lists the NFAs for multi stakeholders that would help to achieve service reusability completely.
TABLE III COMPARISON OF VARIOUS WORKS WITH MULTI-STAKEHOLDERS FOR SERVICE REUSABILITY Contributors Multi stakeholder perspectives for Service Reusability Using NFA Service Consumer Service Provider Service Developer
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) Si Won Choi and Soo * Dong Kim Zain Balfagih Mohd Fadzil Glaster et al. Renuka sindhgatta Our Proposed Model
XVI.
* * * ^
* * * * ^
and * * ^
CONCLUSION Different works has been done to discuss the qualities of SOA. Most of current efforts of SOA quality has not focused on reusability factor and also has not considered the multi stakeholders with the reusability. In this paper we have presented the comprehensive model for reusability that uses non-functional attributes for various REFERENCES Liam OBrien, Paulo Merson, and Len Bass, "Quality Attributes for Service-Oriented Architectures", International Workshop on Systems Development in SOA Environments (SDSOA'07), IEEE 2007 Si Won Choi and Soo Dong Kim, "A Quality Model for Evaluating Reusability of Services in SOA, 10th IEEE Conference on E-Commerce Technology and the Fifth IEEE Conference on Enterprise Computing, E-Commerce and E-Services, 2008 Si Won Choi, Jin Sun Her, and Soo Dong Kim, "Modeling QoS Attributes and Metrics for Evaluating Services in SOA Considering Consumers Perspective as the First Class Requirement", IEEE Asia-Pacific Services Computing Conference, 2007 Zain Balfagih and Mohd Fadzil Hassan, Quality Model for Web Services from Multi-Stakeholders' Perspective", International Conference on Information Management and Engineering, IEEE 2009 Mikhail Perepletchikov, Caspar Ryan, and Zahir Tari, "The Impact of Service Cohesion on the Analyzability of Service-Oriented Software", IEEE Transactions on Services Computing, Volume. 3, No. 2, April-June 2010 Saad Alahmari, Ed Zaluska, David C De Roure,"A Metrics Framework for Evaluating SOA Service Granularity", International Conference on Services Computing, IEEE 2011 Perepletchikov, M., Ryan, C., Frampton, K., and Tari, Z. Coupling Metrics for Predicting Maintainability in
stakeholders. The proposed model shows the way to achieve reusability in all levels thereby enabling complete reusability of the SOA systems. Our future work will be on proposing the measures for each NFAs of different level to completely evaluate the reusability of the entire SOA systems. Service-Oriented Designs, Australian Software Engineering Conference (ASWEC), Melbourne, Australia, IEEE Computer Society, 2007, 329-340 Renuka sindhgatta, et al., "Measuring the Quality of Service Oriented Design, ICSOC-science wave, LNCS 5900, pp. 485-499, 2009 George Feuerlicht, "Simple Metric for Assessing Quality of Service Design ", ICSOC workshops, LNCS 6568,pp. 133-143,2011 Bingu Shim, Siho Choue, Suntae Kim, Sooyong Park, "A Design Quality Model for Service-Oriented Architecture", 15th Asia-Pacific Software Engineering Conference, IEEE 2008 T. Erl. "Service Oriented Architecture, concepts, Technology and design". The Prentice Hall serviceoriented computing series. 2006. T. Erl. "SOA principles of Service Design",The prentice Hall Service-Oriented Computing Series .2009 D. J. Artus, SOA Realization: Service Design Principles, IBM Developer Works, 2006. Michael Rosen, et al., Applied SOA Service-Oriented Architecture and Design Strategies, Wiley India Edition, 2008 M. Galster and E. Bucherer, A Taxonomy for Identifying and Specifying Non-functional Requirements in Service oriented Development, Proceeding IEEE Congress on Services, 2008
Classification of network activity schema for scan detection

1.M.ABDUL RAHIM - B.E.-CSE,KVCET 2. R.SRI BALAJI - B.E.-CSE,KVCET 3. JATHIN B.R - B.E.-CSE,KVCET 4.PALPANDI.S,ASST.PROF-CSE,KVCET Abstract Internet traffic is neither well-behaved norwell-understood, which makes it difficult to detect malicious activities such as scanning. A large portion of scanning activity is of a slow scan type and is not currently detectable by security appliances. In this proof-of-concept study, a new scan detection technique is demonstrated that also improves our understanding of Internet traffic. Sessions are created using models of the behavior of packet-level data between host pairs, and activities are identified by grouping sessions based on patterns in the type of session, the IP addresses, and the ports. In a 24-h dataset of nearly 10 million incoming sessions, aprodigious 78% were identified as scan probes. Of the scans, 80% were slower than basic detection methods can identify. To manage the large volume of scans, a prioritization method is introduced where in scans are ranked based on whether a response was made and on the periodicity of the probes in the scan. The data is stored in an efficient manner, allowing activity information to be retained for very long periods of time. This technique provides insight into Internet traffic by classifying known activities, giving visibility to threats to the network through scan detection, while also extending awareness of the activities occurring on the network.
1.INTRODUCTION THIS paper focuses on the use of biometric person recognition for secure access to restricted data/services using a mobile phone with Internet connection. Many commercial and research efforts have recently focused on this subject (as discussed in Section II). However, in spite of the great amount of particular applications that can be found, the cost of changing or modifying biometric platforms, the lack of normalization in capture-device technology, and communication protocols, as well as social-acceptance drawbacks, are all barriers to the popularization of biometric recognition. There are four main questions that need to be answered for a better understanding of our proposal. 1) What is biometric person recognition? 2) Why use biometry? 3) Why use biometry in mobile phones/devices? 4) Why use web-based access? Let us begin by briefly answering these questions. What is biometric person recognition? This is the use of unique human characteristics (i.e., biometrics) to recognize the user. Biometrics can be divided into two categories based upon the underlying characteristic they are using [1]: physiological, which is based on direct measurements of a part of the human body (e.g., iris, fingerprint, face, hand shape, etc.), and behavioral, which is based on measurements and data derived from an action performed by the user and, thus, indirectly measuring some characteristics of the human body (e.g., voice, keystroke dynamics, signature-handwriting, gait, etc.). Biometric-recognition tasks can be split into two groups: identification (Who is the owner of this biometric?) and verification or authentication (Am I the person I claim to be?). Identification requires a large amount of processing, and is time consuming if the database is very large. It is often used to determine the identity of a suspect from crime-scene information. Verification requires less computer load as the user sample is only matched with a claimed identitystored template and is often used to access places or information. Why use biometry? There are three general categories of user authentication: 1) something you know, e.g., passwords and
personal-identification numbers (PINs), 2) something you have (e.g., tokens), and 3) something you are (e.g., biometrics) [1]. The dominant approach on current control access is via password or PIN, but its weaknesses are the most clearly documented: If it is easy to remember, it is usually easy to guess and hack into, but if it is difficult to attack, it is usually difficult to remember; hence, a lot of people write them down and never change them. An interesting study on the problem of passwords from a commercial point of view can be found . The problem with tokens is that they authenticate their presence, but not the carrier; they can be easily forgotten, lost, or stolen, and, as it happens with the credit cards, can be fraudulently duplicated. As a result, biometry appears as a good solution, which is generally used, in addition to the previous authentication methods, to increase security levels. Another very well-known and important area of application is the one used by the police to identify suspects. Here, fingerprints and DNA are the most-commonly used ones. Why use biometry in mobile phones/devices? Today, with the advancement of mobile handsets and wireless networking, mobile devices have both the network access and computing capacity to provide users with a diverse range of services (e.g, secure payment [3], e-banking [4], e-commerce (better: commerce [5]), etc.). According to the European Information Technology Observatory (EITO) (August 20091), the number of mobile phone users worldwide will exceed the 4 billion mark. Why use web-based access? It is a standard communication protocol. A lot of remote services are accessible via web (e.g., e-banking, e-commerce, e-mail, etc.). Only a web browser and internet connection are needed, which, at this moment, are available in different platforms: personal computers (PCs), laptops, NetBooks, personal digital assistants (PDAs), video-game consoles, and, of course, mobile phones. Therefore, web services can be accessed from different types of devices in the same way. This last point is an important goal of our proposal. The problem of capturing and sending the biometrics to the web server via PC is very easy to solve using embedded applications in the web pages as Applets Java, ActiveX controls,
Java script, Flash technology, or Microsoft Silver light in our implementations, Applets have been used (see Section IV). However, due to the limitations of the devices, this solution is not possible in current mobile phones, as shown in Section III. Hence, a new solution is needed. The proposal of this study is to present a novel mobilephone application architecture to capture and send the biometric to the web server based on the use of an embedded web browser. The current mobile technology is not ready for embedded applications in mobile web browsers; however, it is prepared for our solution, which is very easy and effective, as will be seen. II. RELATED WORKS/APPLICATIONS The majority of the works are proposals of biometric recognition systems adapted to mobile device limitations. Therefore, the recognition runs entirely on the device, i.e., there is no communication with a server. These studies focused on the template/model creation and matching (i.e., classification algorithms)parts of the biometric system (see Fig)
It is difficult to find an optimal biometric for practical applications, which is nonintrusive, easy, and secure to capture with good recognition performance. The use of several biometrics (i.e., multimodal biometric) may be a solution [12] and is an important fieldwork at present. Not many studies have been carried out on the use of mobile devices; some of them can be found in [13] (voice, face, and keystroke), [23] (face, voice, keystroke, and fingerprint), [14] (voice
and face), and [24] (fingerprint and voice). Proprietary databases have been used to perform some of the previous studies, but public ones can also be found, for example, the Massachusetts Institute of Technology Mobile Device Speaker Verification Corpus [11] and Biosecure Multimodal Database (BMDB) Mobile Dataset (DS3), where mobile devices under degraded conditions were used to build this dataset; 2-D face, talking-face sequences (both indoor and outdoor), signature, and fingerprint being captured [25]. User recognition is usually performed just before accessing the controlled service (i.e., login time); however, some authors propose the interesting concept of transparent authentication , i.e., to recognize the user during the run time, for example, while he/she is keystroke logging or writing a short message service (SMS), during a telephone conversation, during a video call, when the user is walking, etc. Theoretical proposals of practical applications can also be found. Clarkei et al. [13] showed a general clientserver nonintrusive and continuous-authentication (NICA) architecture. A NICA prototype was implemented, but the client was deployed in a laptop and an HP Mini-Note, and a real mobile phone was not used (we, however, will show a prototype of our proposal for mobile). Similar proposals can be seen in previous works of the same authors, e.g., in [1]. Another interesting contribution that can be found related. The aim of the project was to integrate a biometric recognizer into a 3G/beyond 3Genabled PDA to allow users to mutually recognize each other and securely authenticate messages (i.e., text or audio). This would enable them to legally sign binding contracts on their PDA/mobile phone. The biometric recognizer combines sourceauthentication methods on the basis of textdependent speaker verification, video recordings of the speakers face, and a written signature. As in our proposal, the SecurePhone platform is entirely software based. This is important if it is to be adopted by device manufacturers as it keeps costs down and makes its implementation much easier. A database was recorded on a Qtek2020 PDA, which includes voice, face, and signature. The authentication data and the digital signature are stored on a subscriber
identification module (SIM) card; therefore, a standalone topology is performed. Mobile Biometry (MOBIO).The MOBIO concept is to develop new mobile services secured by biometric authentication means. Scientific and technical objectives include robustto-illumination face authentication, robust-to-noise speaker authentication, joint bimodal authentication, model adaptation, and scalability. The projectdemonstration system will include two main scenarios: 1) embedded biometry, where the system is running entirely on a mobile phone, and 2) remote biometry. The latter is the scenario approached in our study, for which a general solution is presented in this paper; alongside, three already developed demonstration systems. III. MOBILE PHONES AND WEB-BASED BIOMETRIC CAPTURE: STATE OF THE ART As has been seen, our goal is to perform a biometric recognition during a web session, when a mobile phone is used. The biometric-user authentication can be used to substitute the password or can be used in addition to it. This has already been done for PC, laptop, and similar platforms by us2 (for more details, see Section IV), and other authors (e.g., see [29]) and companies (e.g,Dynamic Biometric Systems or Communication Intelligence Corporation (CIC), both related with signature recognition). Analyzing the technologies used for embedded programs in a web page in order to capture and send the biometrics, we have found the following:3 Applet Java, ActiveX controls, Flash technology, JavaScript, and Microsoft Silver light. The last two have only been found to acquire signature and capturing the mouse events. Our first approximation to the problem was to perform the biometric acquisition by means of a mobile phone in the same way as with the PC, i.e., using the aforementioned technologies to embed applications in a web page. Knowing the computational restrictions of the mobile devices, a study of the state of the technology in the main mobile-phone platforms and browsers was necessary.
IV. SYSTEM PROPOSAL First, we will describe the general architecture of our system, and then, we will show three systems implemented from it. The first one is signature-based, the second is speech-based, and the third is face-based. Architecture According to a general biometric system consists of the following four modules. 1) Sensor module (or biometric reader): This is the interface between man and machine; therefore, the system performance depends strongly on it. 2) Quality assessment and feature-extraction module: The data provided by the sensor must first be validated from the point of view of quality, refusing it when the quality is too poor, and, second, extracting the features that represent, in the best possible way, the identity of the individual. 3) Matcher and decision-making module: The extracted features are compared with the stored templates to generate a score to determine whether to grant or deny access to the system. 4) Database system: This is the repository of the biometric information. During the enrollment phase, the templates are stored along with some additional personal information, such as name, address, etc. The modules of the proposed architecture are allocated mainly on the server, looking for greater system security, upgrade control, and avoiding computation limitations. However, depending on the needs, some parts can be moved to the client; specially, the modules for acquisition, validation, and data preprocessing. The modular architecture proposed allows the vendors to build their biometric solutions based on our architecture so that server and client software can be decoupled, and thus, encouraging the development of applications by different teams companies, operating under well-established standard protocols already known by the community of developers. The main modules of the proposed architecture are the following. 1) Client Tier: On the client side, the biometric acquisition software is deployed. Since, as noted above, there are no standard software solutions for web browsers to capture biometric data, this part
should be distributed ad hoc for each type of platform.For this reason, our architecture proposes to leave only the data-capturing module on the client side, with the rest of the modules at the server side. This means that the applications developed need no special memory or processing requirements, since the main computer load falls on the execution of a web navigator and standard mobile devices (e.g., touch screen, microphone, camera, etc.) are used to capture the biometrics; then, our proposal can be run in, practically, any current mid-range to high-range mobile devices. The application at this side controls and communicates with the following three main general components, which will be explained in greater detail in the next section.
Tomcat application server. The server modules for capture and preprocessing have been developed in the hypertext-processor (PHP) programming language, and the verification engine was written in Java. This latter verification at system side is described and was one of the winning systems of the international competition signature . it shows real screen captures of a use example of our signature-recognition experimental web using PC and a mobile device. As can be seen, the differences with regard to the user point of view are minimal. 2) Voice-Based System: This application allows services local data of the mobile device to be
accessed after authentication by speech, although the biometric recognition is performed remotely. Client side: A system has been developed that enables multi device authentication from both a PC and mobile device. 1) For capturing the data with a PC browser, a Java Applet that captures voice and sends it to the server has been developed. 2) For speech acquisition in the mobile device, an application in the .NET framework that operates almost the same as the signature system has been developed, but with three differences: 1) The URLs needed to manage the application from the remote-resource access are within the application code, which means greater security but less versatility. 2) The signature is sent by POST method . 3) The uploaded-component functionalities have been modified so that it manages the local access, as is explained now. The way to access the remote result of verification is through messages introduced in the PHP page code responsible for the verification of the voice. In this way, the uploaded component also manages any errors that may occur while processing and testing the speech sample. The application can be downloaded from the previous item web pages. Server side: An Apache web server has been used. Other server modules have been developed in PHP programming language, for the capture engine, and C and UNIX Shell for the preprocessing and verification engine. The real screen captures of the mobile application. 3) Face-Based System: This application allows services local data of the mobile device to be accessed after authentication by face, although the biometric recognition is performed remotely. The Screen capture (using MyMobiler) of the mobile experimental speaker recognition application. Here, the PDA is used. (a) HOMEPAGE. (b) Voice Access web page. (c) Speech acquisitionStep 1: Open the recorder. (d) Speech acquisitionStep 2: Recording (this can be played). (e) Speech acquisition Step 3: Analyze to send the recording (this can take several seconds). (f) Data sending to the web server. (g) Authentication result (local message).
with one allowing the password to be substituted by the signature in a web access to a restricted service, and the others allowing a restricted access to local data and applications in the mobile phones.
Screen capture of the mobile experimental facerecognition application for (a)(e) Windows mobile and (f)(j) Android mobile device platforms. (a)and (f) Face Access web page. (b) and (g) Face acquisition. (c) and (h)Device image-captureapplication execution. (d) and (i) Image checking, which can be sent to the web server or taken once again. (e) and (j) Authentication result (local message). V. CONCLUSION In this paper, the problem of using biometric user authentication during a standard web session when a mobile phone is used has been successfully approached. We have focused on the technological problem of capturing the biometric with the mobile phone, sending it to the web server, and, after user authentication, allowing or rejecting the users continuation with the web session in the same way this had been performed using password authentication. First, we have shown that there are several related works, projects, and commercial applications; however, as far as the authors knowledge, none of them have approached the biometric recognition in a mobile environment via the Web. Second, we have proved that the standard solutions to approach the problem in PC platforms, using Applets Java, ActiveX controls, JavaScript, or Flash technology, do not work under mobile platforms. Therefore, a new alternative is needed. A solution has been shown that basically consists of, instead of embedding an application in the web page, embedding a web browser in a mobile-phone application, using a modular architecture to develop the biometric web application. Three different implementations of this simple, but very effective, idea have been shown, Organized by: Department of Computer Science and Engineering, Anand Institute of Higher Technology, Chennai. www.iconic12.in E-mail: iconic.aiht@gmail.com Page 565
Security Enhancement of Secure Simple Pairing & Group Key Transfer Protocol in Bluetooth
K.Gandhimathi, Assistant Professor, S. Jayaprakash, Assistant professor, Idhaya Engineering College For women chinnasalem, TamilNadu, India Abstract Security is one of the major concerns in wireless communication. Bluetooth is an open wireless communication mainly for exchanging data over short distance from fixed and mobile devices. Anywhere at any time Bluetooth wireless link can be formed between the devices in which the communication will be robust and low power consumption. In this paper we discuss about the Man-In-The-Middle attack on Bluetooth secure simple pairing. We further discuss about how to enhance the security pairing and authenticated key transfer protocol based on secret sharing scheme that KGC can broadcast group key information to all group members at once and only authorized group members can recover the group key; but unauthorized users cannot recover the group key. The authentication process of Bluetooth is also analysed in detail. maximum of seven active slave devices and one master device. Each piconet devices has different master devices. Since Bluetooth is I. INTRODUCTION wireless communication system, Bluetooth Wireless LANs are becoming more popular security issue should have to be consider into days environment because of increasing mainly because the data transferred during requirements for mobility, relocation and communication can be modified by an attacker coverage of locations which is difficult by by jamming the physical layer or modified wire. Wireless LANs can be designed in many information can be transferred to piconet different kind based on the application. devices .To provide protection for Bluetooth Bluetooth system is the first commercial radio communication, the system can establish system to be used on a large scale and widely security at several protocol levels. In most available to the public. Bluetooth devices have secure communication, the following two attracted considerable attention in recent years. security functions are commonly considered: Bluetooth is short range (RF) radio Frequency Message confidentiality: Message communication [2][3]. It operates at 2.4 GHz confidentiality ensures the sender that the frequency in the free ISM-band (Industrial, message can be read only by an intended Scientific, and Medical) by using frequency receiver. Message authentication: Message hopping. The most important characteristics of authentication ensures the receiver that the the Bluetooth specification is that it allow message was sent by a specified sender and the devices from lots of different manufacturers to message was not altered en route. To provide work with one another. For that reason, these two functions, one-time sessionkeys Bluetooth doesn't only define a radio system, need to be shared among communication but also a software stack that enables entities to encrypt and authenticate messages. applications to find other Bluetooth devices in Thus, before exchanging communication the area, discover what services they can offer, messages, a key establishment protocol needs and use those services. Bluetooth devices form to distribute one-time secret session keys to all a piconet with group of user devices which are participatingentities. The key establishment in link with each other. The Bluetooth device protocol also needs to provide confidentiality which initiates a connection is the piconet and authentication for session key. The most master and all other devices are said to be well-known group key management protocols piconet slaves. One piconet can have can be classified into two categories Organized by: Department of Computer Science and Engineering, Anand Institute of Higher Technology, Chennai. www.iconic12.in E-mail: iconic.aiht@gmail.com Page 566
Centralized group key management protocols: a group key generation center is engaged in managing the entire group.Distributed group key management protocols: there is noexplicit group key distribution center, and each group membercan contribute to the key generation and distribution.In this paper we discuss about that, in Bluetoothcommunication without any verification of the public keys,MITM attacks are generally possible against any message sentby using public-key technology. In this paper we discussabout the Man-The-Middle attack on Bluetooth Secure SimplePairing (SSP) and proposed a modification to the SSP to preventMan-In-The-Middle attack thereby enhancing the security levelof Bluetooth. The goals and security threats of our group key transfer protocol is also discussed. II. OVERVIEW OF SECURE SIMPLE PAIRING In simple Bluetooth security levels are divided into three categories. Public: In public mode the device can be both discoverable and connected. private: The device is said to be in hidden mode .Connections will be accepted only if the Bluetooth device is known to its prospective master. Silent: The device does not connect to any device .It simply monitors the Bluetooth traffic. In Bluetooth version 2.1, Secure Simple Pairing(SSP) is feature enhanced mainly for two concerns: Security and Simplicity of the pairing process. Elliptic Curve Diffie-Hellman public-key cryptography is employed in Secure Simple Pairing. The link key is constructed using, public private key pairs, a number of nonce, and Bluetooth addresses passive eavesdropping, as running an exhaustive search on aprivate key with approximately 95 bits of entropy is currentlyconsidered to be infeasible in short time.Secure Simple Pairing uses four different kinds ofassociation models, I. Out-of-Band The main idea of the Man-In-The-Middle attack is falsification of information transferred during the Bluetooth communication between the paired devices [4][5][6].These Man-In-The-Middle attack is depicted by threescenario, In the first scenario, the MITM First disrupts (jams)the physical
II. Numeric comparison III. Passkey Entry IV. Just Works In order to provide protection against Man-InThe-Middle attack Bluetooth uses Out-ofBand association modelwhich is said as near field communication. In this OOBchannel communication the user is asked to compare the twosix-digit number which is not controlled by MITM. ThePasskey Entry model is used when one device incommunication has both the input and output capability butthe other has only receiving capability. The Passkey Entrymodel is also used if both the devices have only inputcapability. Finally the Just Work association model is used ifthe devices has neither input nor output capability. It justsimply initiates Bluetooth connection.Secure Simple Pairing is comprised of six phases[5][8]:1) Capabilities exchange: New connection between the newdevices or re-pairing between the devices, first exchange theirIO (Input/Output) capabilities (see Table 1) to determine theproper association model to be used.2) Public key exchange: The devices generate their publicprivate key pairs and share the public keys to each other. Diffie-hellman key exchanged is used in this stage.3) Authentication stage 1: The protocol that is run at this stagedepends on the association model. One of the goals of thisstage is mainly to ensure that there is no MITM in thecommunication between the pairing devices. This is achievedby using a series of nonces, and a final check of integritychecksums performed either through the OOB channel or withthe help of user.4) Authentication stage 2: The devices now complete theexchange of values (public keys and nonces) and verify the integrity of them.5) Link key calculation: Finally the parties compute the linkkey using their Bluetooth addresses, the previously exchanged values and the Diffie-Hellman key. layer (PHY) by hopping along with the victim. devices and sending random data in every timeslot. Anotherpossibility is to jam the entire 2.4 GHz band altogether byusing a wideband signal. In this way, the MITM shuts down all piconets within the range of susceptibility and there is noneed to use a bluetooth chipset to
generate hopping patterns.Finally, a user thinks that something is wrong with Bluetoothdevices and deletes previously stored link keys. After that theuser starts to initiates a new pairing process by using SSP, andthe MITM can forge messages exchanged during the IOcapabilities exchange phase.Suppose if both the victim devices not yet mettogether and no link key are exchanged between those devicesit is easier to MITM to jam the PHY III. PROPOSED SYSTEM The two main idea in related to proposed work modelare how to enhance secure simple pairing to overcome MITM attack and group key transfer protocol establishment usingKGC . Thus the group key transfer protocol helps us totransfer the key to all the user in a group at once in anauthenticated way without any external or internal attack,since because the group key can be recovered only by theauthenticate user.
layer. This kind ofdevices consists of two different kind of scenario. In the firstscenario the victim devices A or B initiate the Secure Pairingprocess and the MITM wait until the victim devices initiate. Inthe second scenario the MITM A or B initiate the SecureSimple Pairing. After that attack proceeds and the MITMcontinuous its process with the link key of the paired devices. attack. Thus the SecureSimple Pairing provides Secure Pairing between the Bluetoothdevices.Key exchange between authenticated users can beimplemented using the basic Elliptic curve Diffie Hellman(ECDH) cryptography Algorithm which is as follows The Elliptic Curve DiffieHellman (ECDH) key agreement protocol [7] allows two users to create a sharedsecret agreement. The ECDH protocol depends on two public parameters: p and g. Parameter p is said to be a large primenumber, and parameter g is an integer that is less than p.These two public parameters are exchanged over a nonsecureline. After both devices receive the two public parameters,they select private integers. Device A selects a while Device Bselects b. These values are referred to as private keys. DeviceA and Device B then create public keys by using the publicparameters and their private keys. These are asymmetric keysbecause they do not match. Device A and Device B exchangethese public keys and use them to compute their shared secretagreement. The generated secret key are symmetrical to bothDevice A and B.To verify Secure Simple Pairing authenticity thecommitment value can be encrypted. The DHKey does no tensure users authenticity; two parties can use common secretcryptographic function to compute a symmetric key that willbe used to encrypt the commitment value. Although theintruder might have the DHKey, he cannot able to computethe symmetric key as he does not know the cryptographicfunction. Thus the created symmetric key is XORed with the commitment value of the slave devices. 1.b) Model for Group Key Transfer Protocol:
Fig 2: Main Idea Of MITM attack 1.a) Modified Secure Simple Pairing: Although it was expected that Secure Simple Pairingwould be able to prevent the Man-InThe-idle Attack, It fails to meet this goal. To perform Secure Simple Pairing effectivelya small change in Authentication stage1 can provide aauthentication from this MITM Fig. 3 Modified Authentication Stage 1 ofSecure Simple Pairing
Group key transfer protocol relies on one trusted entity,KGC, to choose the key, which is then transported to each member involved. Each user is required to register at KGC forsubscribing the key distribution service. The KGC keepstracking all registered users and removing any unsubscribedusers. During registration, KGC shares a secret with each user.In most key transfer protocol, KGC encrypts the randomlyselected group key under the secret shared with each userduring registration and sends the cipher text to each group member separately[8][9]. An authenticated message checksum is attached with the cipher text to provide group key authenticity. In this approach,the confidentiality of group key is ensured using anyencryption algorithm which is computationally secure. Ourprotocol uses secret sharing scheme to replace the encryptionalgorithm. A broadcast message is sent to all group membersat once. The confidentiality of group key is informationtheoretically secure. In addition, the authentication ofbroadcasting message can be provided as a groupauthentication. This feature provides efficiency of ourproposed protocol. The main security goals for our group key transferprotocol are: 1) key freshness;2) key confidentiality; and 3)key authentication. Key freshness is to ensure that a group key has neverbeen used before 10][11]. Thus, a compromised group keycannot cause any further damage of group communication.Key confidentiality is to protect the group key such that it canonly be recovered by authorized group members; but not byany un-authorized user. Key authentication is to provideassurance to authorized group members that the group key isdistributed by KGC; but not by an attacker. In Group KeyGeneration Protocol the main focus is on protecting group keyinformation broadcasted from KGC to all group members.The service request and challenge messages from users toKGC are not authenticated. Thus, an attacker can impersonatea user to request for a group key service. In addition, attackercan also modify information transmitted from users to KGCwithout being detected. We need to
analyze security threats caused by these attacks. But here none of these attacks can successfully attackto authorized group members since attackers can neitherobtain the group key nor share a group key with authorizedgroup members. User/message authentication and keyconfirmation can be
easily incorporated into our protocolsince each user has shared a secret key with KGC duringregistration. However, these security features are beyond thescope of our fundamental protocol. The key generation and distribution process contains fivesteps. Step 1. The initiator sends a key generation request to KGCwith a list of group members as {U1;U2; ... ;Ut}. Step 2. KGC broadcasts the list of all participating members,{U1;U2; ... ;Ut}, as a response. Step 3. Each participating group member needs to send a random challenge, Ri belongs to Zn to KGC. Step 4. KGC randomly selects a group key, k, and generatesan interpolated polynomial f(x) with degree t to pass through(i+1) points, (0,k) and (xi ,yiRi) for i=1, ... ,t. KGC alsocomputes t additional points, Pi , for i=1, ... ,t on f(x) andAuth=h(k,U1, ... ,Ut,R2, ... ,Rt,P1, ... ,Pt) where h is a onewayhash function. All computations on f(x) are over Zn.KGC
broadcasts {Auth,Pi} for i=1, ... ,t, to all groupmembers. All computations are performed in Zn. Step 5. For each group member, Ui , knowing the sharesecret, {xi ,yiRi}, and t additional public points, Pi, for i=(1,... ,t) on f(x) is able to compute the polynomial f(x) andrecover the group key k =f(0). Then, Ui computes h(k, U1, ...,Ut,R1, ... ,Rt,P1, ... ,Pt) and checks whether this hash value isidentical to Auth. If these two values are identical, Uiauthenticates the group key is sent from KGC. IV. CONCLUSION Bluetooth offers a lot in a market place that isbecoming increasingly more competitive. Thus the popularityof Bluetooth makes us to concentrate more on Bluetoothsecurity. In this paper we discuss about Man-In-The-MiddleAttack on secure Simple Pairing. We analyse an idea abouthow a MITM attack takes place in Secure Simple Pairingbased on Elliptic Curve Diffie-Hellman cryptography. Inaddition to this we have an idea to modify the existing SecureSimple Pairing protocol to enhance its security and also ingroup keying method every user needs to register at a trustedKGC initially and preshare a secret with KGC. KGCbroadcasts group key information to all group members atonce. The confidentiality of our group key distribution isinformation theoretically secure. REFERENCES [1] K.Haataja, P.Toivanen: Two practical man-in-the-middleattacks on Bluetooth secure simple pairing and countermeasures ;Wireless Communications, IEEE Transactions on Volume: 9 , Issue:1 Digital Object Identifier: 10.1109/TWC.2010.01.090935Publication Year: 2010, Page(s):384392 [2] Bluetooth SIG, Bluetooth Specifications 1.0 3.0+HS.[Online].Available:http://www.blueto oth.com/Bluetooth/Technology/Building/Speci fications.[Accessed Sep. 17, 2009]. [3] Bluetooth SIG, Bluetooth Wireless Technology Surpasses OneBillion Devices.
[Online] http://www.bluetooth.com/Bluetooth/Press/SI G/bluetooth wireless technology surpasses one billion devices.htm. [Accessed sep. 17,2009 [4] K. Hypponen and K. Haataja, Nino man-in-the-middle attackon Bluetooth secure simple pairing, in Proc. IEEE ThirdInternational Conference in Central Asia on Internet, The Next Generation of Mobile,Wireless and Optical CommunicationsNetworks (ICI2007), Tashkent, Uzbekistan, Sep. 2007. [5] K. Haataja and K. Hypponen, Man-in-themiddle attacks onBluetooth: a comparative analysis, a novel attack, andcountermeasures, in Proc. IEEE Third International Symposium onCommunications, Control and Signal Processing (ISCCSP2008), St.Julians, Malta, Mar. 2008. [6] K. Haataja and P. Toivanen, Practical man-in-the-middle attacksagainst Bluetooth secure simple pairing, in Proc. 4th IEEEInternational Conference on Wireless Communications, Networking and Mobile Computing (WiCOM2008), Dalian, China, Oct. 2008. [7] MSDN library-"Overview of the ECDH Algorithm (CNG Example)";http://msdn.microsoft.com/enus/library/ cc488016.aspx [8] Bluetooth Special Interest Group-"Simple PairingWhitepaper";http://www.bluetooth.com /NR/rdonlyres/0A0B3F36-D15F-4470-85A6F2CCFA26F70F/0/SimplePairing_WP_V10r0 0.pdf [9] G.R. Blakley, Safeguarding Cryptographic Keys, Proc. Am.Federation of Information Processing Soc. (AFIPS 79) NatlComputer Conf., vol. 48, pp. 313-317, 1979. [10] S.Berkovits, How to Broadcast a Secret, Proc. Eurocrypt 91Workshop Advances in Cryptology, pp. 536-541, 1991. [11] J.M. Bohli, A Framework for Robust Group Key Agreement,Proc. Intl Conf.
Computational Science and Applications (ICCSA06), pp. 355-364, 2006. [12] M. Burmester and Y.G. Desmedt, A Secure and EfficientConference Key Distribution System, Proc. Eurocrypt
94Workshop Advances in Cryptology, pp. 275-286, 1994.
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) A Cloud Computing Platform support resting on Peer to peer network P.Bhavani1 A.Malathi2 S.Sasikala3 PG-Student, Department of computer science & Engineering SRM University, Ramapuram Campus, Chennai Abstract In recent years, the technology of cloud computing has been widely applied in e business, eeducation and etc.. Cloud computing platform is a set of Scalable large-scale data server clusters, it provide computing and storage services to customers. The cloud storage is a relatively basic and widely applied service which can provide users with stable, massive data storage space. Our research shows that the architecture of current Cloud Computing System is central structured one; all the data nodes must be indexed by a master server which may become bottle neck of the system. In this paper, we propose new cloud storage architecture based on P2P network and design a prototype system. The system based on the new architecture has better scalability and fault tolerance. scalability and fault tolerance. Rest of the paper is organized as follows, in section 2, we
1. Introduction A cloud computing platform dynamically provisions, configures, reconfigures, and provisions servers as needed. Servers in the cloud can be physical machines or virtual machines. Advanced clouds typically include other computing resources such as storage area networks (SANs), network equipment, firewall and other security devices. [1] This paper will focus on the storage service from cloud. Some typical cloud systems, such as GFS of Google[2], Blue Cloud of IBM[1], Elastic Cloud of Amazon[3], have a similar architecture for storage. In the system architecture, there is a central entity to index or manage the distributed data storage entities. It is effective to simplify the design and maintenance of the system by a central managed architecture, but the central entity may become a bottleneck if the visiting to it is very frequent. Although systems in practice have used some technique as backup recovery to avoid the probably disaster from the central bottle neck, the flaw come from the architecture has not resolved essentially. In this paper, we propose a cloud computing architecture based on P2P network which provide a pure distributed data storage environment without any central entity. The cloud based on the proposed architecture is self-organized and self-managed and has better
will introduce some related work about cloud storage system and P2P network storage system. In section 3 of this paper, we describe a typical scenario to explain the architecture of our proposed cloud computing storage environment. In section 4, there is an introduction on our prototype about the P2P network cloud system. Section 5 is conclusion and proposal for future work. 2. Related Works In this section, we will introduce some related work about cloud system and P2P network products for storage. 2.1 Google File Systems The first to give prominence to the term cloud computing (and maybe to coin it) was Googles CEO Eric Schmidt, in late 2006[4]. Google Inc. has a proprietary cloud computing platform[5] which was first developed for the most important application of Google search service[6] and now has extended to other applications. Google cloud computing infrastructure has four systems which are independent of and closely linked to
each other. They are Google File System for distributed file
one of the replicas and fetches the data wanted. [2] DHT arithmetic can resolve the problems of bottle neck come from central index system. Since the management is distributed equality to every peers in the network, there is no bottle neck any more, but the new problem is how to keep the consistency of the replica when read/write. Some P2P network systems for distributed storage have been developed now, such as Ivy[10], Eliot[11], Oasis[12], OM[13], Sigma[14], etc.. They keep the replica consistency in different way and index the data resource by DHT. In the following section, we will propose a cloud storage system based on P2P network which can keep the consistency with an innovative method. 3. Cloud Based on P2P network 3.1 Architecture
The GFS above is actually a central indexed distributed storage system GFS master work as an index server which can provide the global information about each chunk server for clients. The flaw of central index architecture is that the GFS master may become bottle neck of the system since all the request to the target data chunk must be originated from the index server which burdens the master. 2.2 P2P network Storage System The distributed P2P NETWORK network indexed by storage, MapReduce program model for parallel Google applications[7], Chubby for distributed lock mechanism[8] and BigTable for Google largescale distributed database[9]. Figure 1. shows the architecture of Google file system. A GFS cluster consists of a single master and multiple chunk servers and is accessed by multiple clients. Chunk servers store chunks on local disks as Linux files and read or write chunk data specified by a chunk handle and byte range. The master maintains all file system metadata. This includes the namespace, access control information, the mapping from files to chunks, and the current locations of chunks. When a client wants to visit some data on a chunk server, it will first send a request to the Master, and the master then replies with the corresponding chunk handle and locations of the replicas. The client then sends a request to
Roles involved in our architecture can be defined as follows and illustrated as below. Client App: the client application which wants to get the data from the platform; Gateway: the entity which can transfer the request or response between the Client App with the network and can lead the request to the nearest node in the network.
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) Chunk Server: the entity which is served as the data resource node and P2P network node. Different with the function of pure data storage in GFS, the chunk server here has three function modules with separated interfaces. As shown in the figure above: Index Module, take charge of part of the global resource index which is assigned by DHT arithmetic such as Chord, Pastry and so on. Route Module, pass a lookup request by a next hop routing table which is also assigned by DHT. Data Module, provide the data resource stored in the local machine. In the index module, as shown in the figure We design a new system of P2P network storage for cloud platform, which can take advantage of the P2P network distribute architecture and do well in concurrent update. Following figure 2 shows the architecture of the system: 3, there is a chain containing the data index information pointers to all of the data blocks with the same name ID will be linked in a sub chain. A pointer contains the address of a data block and the update version number of that block.
When a Client lookup data block, the work flow is different, as below: 1. Client App sends a request for a data block with logic identifier to Gateway; 2. Gateway analysis the request, parsing the identifier of the data block in the request, such as logic address, and change it to 128 bits logic ID by DHT algorithm which can be recognized by chunk server P2P network 3. Gateway constructs a P2P network search request data package including the logic ID, and sends the request to the chunk server P2P network; 4. The P2P network search request package routed among the chunk servers following the P2P network search protocol such as Chord[15], Can[16], Pastry[16], Tapestry[18] and so on. The chunk servers now act as a routing nodes of P2P network and the routing interface will be taken used of; 5. The request reaches the server which contain the index information of the logic ID in searching; 6. The index includes all the pointers of the data replica with the same ID. The chunk server now acts as an index server and the index function interface will play its role. The chunk server will select a latest pointer by its version number, if there are more than one candidates, the server should select a nearest one by comparing the IP address of the client App and the data resource container, then return the best address to the client; 7. When the Client App gets the best address, it will then send its request to the address of the chunk server which contains the data block. Now the chunk server acts as a data provider as the traditional cloud storage platform does. 3.3 Replica Control As we all know, when the cloud platform provides storage service to a user, a frequently met problem is write/read for mutual exclusion. In a central managed system as GFS, this can be resolved by lock mechanism, such as so called lease and mutation order[2]. But in distributed environment, it will be more complicated. In this section, we will discuss about the replica consistency control. Here is an example for writing consistency in our P2P network cloud storage system:
3.2 Typical Workflow In this section, we will present the system working flow with a typical scenario. At first, before a Client App can do its work, data blocks and the corresponding replica should be uploaded to the Chunk Servers. How to select the chunk servers for storage is the same with the traditional cloud computing platform.
1. Client finds the chunk server node which contains the index information of the target data block. (We will call this chunk server index node later when it provides index function.) 2. Client tell the index node that it will do write operation. 3. The index node check the chain of replicas state to see whether there is another writing being processed. The state of the chain is lock or unlock. If unlock, the index node will allow that write operation by returning the chunk server address of the latest version(if the candidate is multiple, the index node should select a nearest one to the client by comparing the IP addresses), and change the state to lock; If lock, the index node will queue the requests until the state comeback unlock, the first write request in the queue then can be practiced. 4. Client gets the address of the newest version, connects that chunk server, write the update data to the block, then sends message to the index node to notify that the write operation has finished. 5. When the index node receives the finish message, the version number of the pointer to the just modified block will be increased. Then a procedure of consistency update to all the replicas will be start. In the procedure, 6. After a replica server finishes update, it will send an update response messages to the index node and the version in its pointer in the chain will be increased by the index node 7. When all the pointers in the chain have updated to the newest version, the state of the chain will be set unlock. If the period of lock state is overtime, the state of the chain will reset unlock by force. When the chain has set unlock, any delayed update response messages will be discarded and the version of the corresponding pointer in the chain will remain the old. This prevent the system from suspending if there is something wrong with some replica servers when update. Since the
version of the data block on the delayed server is old, no client will visit that server for the old data later unless its update is confirmed. From the example above, we can see that the consistency is controlled by the distributed index node. That is because the entire data block with the same ID will be indexed at a same index node, the distributed separate chunk servers can be managed as a unit here. The replica update procedure is originated by the corresponding index node when any new write operation has been done. The procedure can also start when the index node finds version confliction in a same chain during its periodically check. 4. Prototype System We developed a prototype system based on the proposed architecture of this paper. The design of the system is as follow: We deploy 100 PCs as data chunk servers with the operating system windows XP. After installed the chunk server side software, the PC nodes become Chunk servers of the cloud storage system. The Chunk servers in our prototype are organized by Chord arithmetic of P2P network. Each node forwards query at least halfway along distance remaining to the target. With high probability, the number of nodes that must be contacted to find a successor in a N-node network is O(log N). Our new architecture provides a different mean to manage the chunks without any central master, the cost is more delay when lookup a node. The figure 4 shows the average lookup delay of our system.
[3]Amazon. Amazon elastic compute cloud (Amazon EC2). 2009. http://aws.amazon.com/ec2/ [4]Francesco Maria Aymerich, Gianni Fenu, Simone Surcis. An Approach to a Cloud Computing Network. [5]Barroso LA, Dean J, Hlzle U. Web search for a planet: The Google cluster architecture. IEEE Micro, 2003,23(2):22_28. [6]Brin S, Page L. The anatomy of a largescale hypertextual Web search engine. Computer Networks, 1998,30(1-7):107_117. The deployed chunk servers is from 10 to 100, and each client issues 15 lookups for randomly chosen data blocks one-by-one. Suppose that the throughput of the central master cloud storage system is xMB/S, and each block size is yMB and the average lookup delay in our system is T second, so the new throughput of the P2P network cloud system can be calculated as:y/(y/x)+T . From Figure4 we can see that the median latency ranges about 100ms. The block size of our system is 64M. If the throughput of the central system is 50M/S, then the corresponding in P2P network system is 46M/S 5. Conclusion and Future Work In this paper, we propose a new architecture of cloud computing system based on P2P network protocol, which resolve the problems of bottle neck come from central structure. In the future work, we will do some optimization about the throughput of the system by the technique such as pipelining read or write. Reference [1]Boss G, Malladi P, Quan D, Legregni L, Hall H. Cloud computing. IBM White Paper, 2007.http://download.boulder.ibm.com/ibmd l/pub/software/dw/wes/hipods/Cloud_comput ing_wp_final_8Oct.pdf [2]Ghemawat S, Gobioff H, Leung ST. The Google file system. In: Proc. of the 19th ACM Symp. On Operating Systems Principles. New York: ACM Press, 2003. 29_43. [7]Dean J, Ghemawat S. MapReduce: Simplified data processing on large clusters. In: Proc. of the 6th Symp. on Operating System Design and Implementation. Berkeley: USENIX Association, 2004. 137_150. [8]Burrows M. The chubby lock service for loosely-coupled distributed systems. In: Proc. of the 7th USENIX Symp. on Operating Systems Design and Implementation. Berkeley: USENIX Association, 2006. 335_350. [9]Chang F, Dean J, Ghemawat S, Hsieh WC, Wallach DA, Burrows M, Chandra T, Fikes A, Gruber RE. Bigtable: A distributed storage system for structured data. In: Proc. of the 7th USENIX Symp. on Operating Systems Design and Implementation. Berkeley: USENIX Association, 2006. 205_218. [10]Ivy:A. Muthitacharoen, R. Morris, T. Gil, and B. Chen, Ivy: A Read/write Peer-to peer File System, in Proc. of the Symposium on Operating Systems Design and Implementation (OSDI), 2002. [11]Eliot:C. Stein, M. Tucker, and M. Seltzer, Building a Reliable Mutable File System on Peer-to-peer Storage, in Proc. of 21st IEEE Symposium on Reliable Distributed Systems (WRP2P NETWORKDS), 2002. [12]Oasis:M. Rodrig, and A. Lamarca, Decentralized Weighted Voting for P2P NETWORK Data Management, in Proc. of the 3rd ACM International Workshop on Data Engineering for Wireless and Mobile Access, pp. 8592, 2003. [13]OM:H. Yu. and A. Vahdat, Consistent and Automatic Replica Regeneration, in Proc.
of First Symposium on Networked Systems Design and Implementation (NSDI '04), 2004. [14]Sigma:S. Lin, Q. Lian, M. Chen, and Z. Zhang, A practical distributed mutual exclusion protocol in dynamic peer-to- peer systems, in Proc. of 3rd International Workshop on Peer-to-Peer Systems (IPTPS04), 2004. [15]I.Stoica, R.Morris, D.Karger, F.Kaashoek, and H.Balakrishnan, Chord: A scalable peer-
to-peer lookup protocol for Internet applications, IEEE/ACM Transaction on Networking, vol.11, no.1, Feb.2003, pp.17-32. [16]Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp and Scott Shenker, A Scalable Content-Addressable Network. In: P rocessing of ACMSIGCOMM , 2001, (8) : 1612172.
A survey of recent research and detailed design methodologies intended for industry strength of software engineering
K.Karnavel Dr.R.DilliBabu Anna University, CEG Campus, Chennai Abstract Agent-Oriented Software Engineering is the one of the largest part of modern assistance to the field of Software Engineering. It has several benefits compared in the direction of existing development approaches, in fastidious the ability to allow agents correspond to high-level abstractions of dynamic entities in a software system. This paper gives a survey of modern research and industrial applications of both broad high-level methodologies plus on more detailed design methodologies intended for industry-strength software engineering. Keywords: Intelligent Agents, Software Engineering.
A NOVEL AUTHENTICATION SCHEME USING FINGER PRINT FOR DATA PROTECTION

DR.R.S..RAJESH1,v.akalya2
1
Associate Professor, Manonmaniam Sundaranar University, Tirunelveli, Tamilnadu, India

2
PG Student, Manonmaniam Sundaranar University, Tirunelveli, Tamilnadu, India
Abstract As part of the security within distributed systems, various services and resources need protection from unauthorized use. Remote authentication is the most commonly used method to determine the identity of a remote client. This paper investigates a systematic approach for authenticating clients by three factors, namely password, smart card, and biometrics. A generic and secure framework is proposed to upgrade two-factor authentication to three-factor authentication. The conversion not only significantly improves the information assurance at low cost but also protects client privacy in distributed systems. In addition, our framework retains several practice-friendly properties of the underlying two-factor authentication, which we believe is of independent interest. This paper have combined password, smart card, and biometric (finger print), which is very secure for the client. The user is permitted to give his/her finger print to the main server. If the finger print matches exactly, the user is allowed for the transaction. If the finger print is doubtful and does not match exactly the transaction will fail. INTRODUCTION In a distributed system, various resources are distributed in the form of network services provided and managed by servers. Remote authentication is the most commonly used method to determine the identity of a remote client. In general, there are three authentication factors: 1. Something the client knows: password. 2. Something the client has: smart card. 3. Something the client is: biometric characteristics (e.g., fingerprint, voiceprint, and iris scan). Most early authentication mechanisms are solely based on password. While such protocols are relatively easy to implement, passwords (and human generated passwords in particular) have many vulnerabilities. As an example, human generated and memorable passwords are usually short strings of characters and (sometimes) poorly selected. By exploiting these vulnerabilities, simple dictionary attacks can crack passwords in a short time. Due to these concerns, hardware authentication tokens are introduced to strengthen the security in user authentication, and smart-card-based password authentication has become one of the most common authentication mechanisms. Smart-card-based password authentication provides two factor authentications, namely a successful login requires the client to have a valid smart card and a correct password. While it provides stronger security guarantees than password authentication, it could also fail if both authentication factors are compromised (e.g., an attacker has successfully obtained the password and the data in the smart card). In this case, a third authentication factor can alleviate the problem and further improve the systems assurance. Another authentication mechanism is biometric authentication, where users are identified by their measurable human characteristics, such as fingerprint, voiceprint, and iris scan. Biometric characteristics are believed to be a reliable authentication factor since they provide a potential source of highentropy information and cannot be easily lost or forgotten. Despite these merits, biometric authentication has some imperfect features. Unlike password, biometric characteristics cannot be easily changed or revoked. Some biometric characteristics (e.g., fingerprint) can
be easily obtained without the awareness of the owner.1 This motivates the three-factor authentication, which incorporates the advantages of the authentication based on password, smart card, and biometrics. 1) The three factor (password, smart card, biometric) authentication is very similar to smart card based password authentication 2) In this case the biometric characteristics are kept secret from servers. 3) The three factor authentication is less computationally efficient than smart card based password authentication. BIOMETRIC-FINGER PRINT A biometric is a physiological or behavioral characteristic of a human being that can distinguish one person from another and that theoretically can be used for identification or verification of identity Here the admin login we used finger print. The particular person finger print stored in data base. When that person login to the system, it will compare with database, its match then accepts for it. Otherwise its reject.
Performance Analysis Fig.1. Block Diagram of 2 & 3 Factor Authentication PRIVACY ISSUES Along with the improved security features, three-factor authentication also raises another subtle issue, namely how to protect the biometric data. Not only is this the privacy information of the owner, it is also closely related to the security in the authentication. As biometrics cannot be easily changed, the breached biometric information (either on the server side or the client side) will make the biometric authentication totally meaningless. However, this issue has received less attention than it deserves from protocol designers. SECURITY REQUIREMENTS The attacker can be classified from two aspects: the behavior of the attacker and the information compromised by the attacker. Passive attacker: A passive attacker can obtain messages transmitted between the client and the server. However, it cannot interact with the client or the server. Active attacker: An active attacker has the full control of the communication channel. In addition to message eaves dropping, the attacker can arbitrarily inject, delete and modify messages in the communication between the client and the server. A passive (an active) attacker can be further classified into the following three types. Type I attacker has the smart card and the biometric characteristics of the client. It is not given the password of that client. Type II attacker has the password and the biometric characteristics. It is not allowed to obtain the data in the smart card.
Password Smart card
Input Method Biometric
(Finger print) Data Extraction Method
2 Factor Factor Authentication
3 Combined 2&3 Authentication
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) Type III attacker has the smart card and the password of the client. It is not given the biometric characteristics of that client. Notice that such an attacker is free to mount any attacks on the (unknown) biometrics, including biometrics faking and attacks on the metadata (related to the biometrics) stored in the smart card. FUZZY EXTRACTOR A fuzzy extractor extracts a nearly random string R from its biometric input w in an error-tolerant way. If the input changes but remains close, the extracted R remains the same. To assist in recovering R from a biometric input w0, a fuzzy extractor outputs an auxiliary string P. However, R remains uniformly random even given P. The fuzzy extractor is formally defined as below. The fuzzy extractor to extract or retrieve the images from server It is given by two procedure 1.Gen (Generation) 2.Rep (Reproduction) 1. Gen (Generation):
characteristics and store the extracted biometric data as a template in the server. During the authentication, a comparison is made between the stored data and the input biometric data. If there is a sufficient commonality, a biometric authentication is said to be successful. This method, however, will raise several security risks, especially in a multiserver environment where user privacy is a concern (e.g., in a distributed system). First, servers are not 100 percent secure. Servers with weak security protections can be broken in by attackers, who will obtain the biometric data on those servers. Second, servers are not 100 percent trusted. Server-(equivalently, its curious administrator) could try to login to Server-B on behalf of their common clients, or distribute users biometric information in the system. In either case, user privacy will be compromised, and a single-point failure on a server will downgrade the whole systems security level from three-factor authentication to two-factor authentication (since clients are likely to register the same biometric characteristics on all servers in the system). 2. Error Tolerance and Nontrusted Devices: One challenge in biometric authentication is that biometric characteristics are prone to various noise during data collecting, and this natural feature makes it impossible to reproduce precisely each time biometric characteristics are measured. A practical biometric authentication protocol cannot simply compare the hash or the encryption of biometric templates (which requires an exact match). Instead, biometric authentication must tolerate failures within a reasonable bound. Another issue in biometric authentication is that the verification of biometrics should be performed by the server instead of other devices, since such devices are usually remotely located from the server and cannot be fully trusted. The above two subtle issues seem to be neglected in a recent authentication scheme using finger print for data protection. CONTRIBUTIONS The main contribution of this paper is to develop a authentication scheme using finger print for data protection. It consists of five phases
1. The random string R extract from biometric input w 2. Any changes in the input w 3. The outputs an auxiliary string P 2. Rep (Reproduction):
CHALLENGES IN BIOMETRIC AUTHENTICATION 1. Privacy Issues: A trivial way to include biometric authentication to scan the biometric
1.3 Factor Initialization 2.3Factor Registration 3.3 Factor Login Authentication 5.3 Factor pass word changing 5.3 Factor biometrics changing 13 Factor Initialization: Here the server (denoted by S) generates two system parameters PK & SK PK is published in the system SK is kept secret by S Algorithm :(K)----->(PK,SK) Here K is systems security parameter, which determines the size of PK & SK 2.3 Factor Registration: The Client (denoted by C),with an initial pass word[PW] & biometric characteristics Biodata C[PW,Bio data]----3 S[SK]--- >SC fact reg--->
The client can change his/her biometrics used in the authentication. Eg: using a different finger or using iris instead of finger. BIOMETRIC DATA EXTRACTION In our Project, we combined password, smart card, and biometric (finger print), it is very secure for client. The user is permitted to give his/her finger print to the main server. If the finger print is exactly matched, the user is allowed for the transaction. If the finger print is doubtful and not exactly matched the transaction is failed.
The output of this protocol is smart card(SC), which is given to C 3,3 Factor Login authentication: The client to login successfully using PW ,SC& bio data. C[PW,SC,Biodata]-3 fact login aut-- >S[SK]-- >{0,1} The out put 1-- > successful authentication Fig.2. Block Diagram of Biometric Data Extraction The above figure explain, first up all we will give finger print to sensor, then its go to preprocessing, that finger print extractor extract the finger print features, after that its check to database finger print, its match then it is go to process, otherwise its reject. 1. Data Flow Diagrams: A data flow diagram is a graphical representation or technique depicting information flow and transform that are applied as data moved from input to output. The DFD are partitioned into levels that represent increasing information flow and functional details. The processes, data store, data flow, etc are described in Data Dictionary.
The output 0-- > not successful 4.3 Factor password changing: The client to change his/her password after a successful authentication The data in the smart card will be updated accordingly. 5.3 Factor biometrics changing:
Fig.3.Data Flow Diagram of login user 2. Manually finger print: Here used Biometric(fingerprint),The admin login we used finger print. The particular person finger print stored in data base. When that person login to the system, it will compare with database, its match then accepts for it. otherwise its reject
and resources from unauthorized use. The authentication is based on password, smart card, and biometrics. Our framework not only demonstrates how to obtain secure three-factor authentication from two-factor authentication, but also addresses several prominent issues of biometric authentication in distributed systems (e.g., client privacy and error tolerance). The analysis shows that the framework satisfies all security requirements on three-factor authentication and has several other practicefriendly properties (e.g., key agreement, forward security, and mutual authentication). The future work is to fully identify the practical threats on three-factor authentication and develop concrete three factor authentication protocols with better performances. Here we used Bank application creation using fingerprint.
Fig.4 Manual finger Print. In this analysis phase, some of the steps are includes 1. Take a manual scanned finger print. 2. Convert it into gray scale 3. Change orientation to reference orientation 5. fingerprint reference system Give it the The above form, select 2 banks SBI & ICICI, in our project we used two cards credit card & debit card. The SBI bin allow only 16 bits character & ICICI bin allow only 20 bits character. 1.Admin Login: In our project we used the finger print for admin login.
4. Get features vector to
EXPERIMENTS AND RESULTS Preserving security and privacy is a challenging issue in distributed systems. This project makes a step forward in solving this issue by proposing a generic framework for three-factor authentication to protect services Organized by: Department of Computer Science and Engineering, Anand Institute of Higher Technology, Chennai. www.iconic12.in E-mail: iconic.aiht@gmail.com Page 583
Fig.5. Admin Login The about window display or ask finger print from admin login, we will give the finger print to the system ,its match accept it otherwise reject it,
Fig.6. .Admin Login Finger print CONCLUSION The generic frame work for three factor authentication to protect services & resources from unauthorized use. In our project, we have combined password, smart card, and biometric (finger print),which is very secure for the client. Here the admin login used for password, smart card &biometric(finger print).The finger print implementation very make safe of client, the eavesdroppers not interrupt, so that the finger print implementation is very secure for client. REFERENCES [1] D.V. Klein, Foiling the Cracker: A Survey of, and Improvements to, Password Security, Proc. Second USENIX Workshop Security,1990. [2] Biometrics: Personal Identification in Networked Society, A.K. Jain, R. Bolle, and S. Pankanti, eds. Kluwer, 1999.
[3] D. Maltoni, D. Maio, A.K. Jain, and S. Prabhakar, Handbook of Fingerprint Recognition. SpringerVerlag, 2003. [4] Ed. Dawson, J. Lopez, J.A. Montenegro, and E. Okamoto, BAAI:Biometric Authentication and Authorization Infrastructure, Proc.IEEE Intl Conf. Information Technology: Research and Education (ITRE 03), pp. 274-278, 2004. [5] J.K. Lee, S.R. Ryu, and K.Y. Yoo, Fingerprint-Based Remote User Authentication Scheme Using Smart Cards, Electronics Letters, vol. 38, no. 12, pp. 554-555, June 2002. [6] C.C. Chang and I.C. Lin, Remarks on Fingerprint-Based Remote User Authentication Scheme Using Smart Cards, ACM SIGOPS Operating Systems Rev., vol. 38, no. 4, pp. 9196, Oct. 2004. [7] C.H. Lin and Y.Y. Lai, A Flexible Biometrics Remote User Authentication Scheme, Computer Standards Interfaces, vol. 27, no. 1, pp. 19-23, Nov. 2004. [8] M.K. Khan and J. Zhang, Improving the Security of A Flexible Biometrics Remote User Authentication Scheme, Computer Standards Interfaces, vol. 29, no. 1, pp. 82-85, Jan. 2007. [9] C.J. Mitchell and Q. Tang, Security of the Lin-Lai Smart Card Based User Authentication Scheme, Technical Report RHULMA20051, http://www.ma.rhul.ac.uk/static/techre p/2005/RHUL-MA-2005-1.pdf, Jan. 2005. [10] E.J. Yoon and K.Y. Yoo, A New Efficient Fingerprint-Based Remote User Authentication Scheme for Multimedia Systems,Proc. Ninth Intl Conf. Knowledge-Based Intelligent Information andEng. Systems (KES), 2005. [11] Y. Lee and T. Kwon, An improved Fingerprint-Based Remote User Authentication Scheme Using Smart Cards, Proc. Intl Conf. Computational Science and Its Applications (ICCSA), 2006.
[12] H.S. Kim, J.K. Lee, and K.Y. Yoo, ID-Based Password Authentication Scheme Using Smart Cards and Fingerprints, ACM SIGOPS Operating Systems Rev., vol. 37, no. 4, pp. 32-41,Oct. 2003. [13] M. Scott, Cryptanalysis of an ID-Based Password Authentication Scheme Using Smart Cards and Fingerprints, ACM SIGOPS Operating Systems Rev., vol. 38, no. 2, pp. 73-75, Apr. 2004. [14] A. Bhargav-Spantzel, A.C. Squicciarini, E. Bertino, S. Modi, M. Young, and S.J. Elliott, Privacy Preserving Multi-Factor Authentication with Biometrics, J. Computer Security, vol. 15, no. 5, pp. 529-560, 2007.# [15] S. Goldwasser, S. Micali, and C. Rackoff, The Knowledge Complexity of Interactive ProofSystems, SIAM J. Computing, vol. 18, no. 1, pp. 186-208, Feb. 1989. [16] U. Uludag, S. Pankanti, S. Prabhakar, and A.K. Jain, Biometric Cryptosystems: Issues and Challenges, Proc. IEEE, Special Issue on Multimedia Security for Digital Rights Management, vol. 92, no. 6, pp. 948-960, June 2004. [17] C.-I. Fan and Y.-H. Lin, Provably Secure Remote Truly Three-Factor Authentication Scheme with Privacy Protection on Biometrics,IEEE Trans. Information Forensics and Security, vol. 4,no. 4, pp. 933-945, Dec. 2009.
[18] C.T. Li and M.-S. Hwang, An Efficient Biometrics-Based Remote User Authentication Scheme Using Smart Cards, J. Network and Computer Applications, vol. 33, no. 1, pp. 1-5, 2010. [19] P.C. Kocher, J. Jaffe, and B. Jun, Differential Power Analysis, Proc. Intl Cryptology Conf. (CRYPTO), pp. 388-397, 1999. [20] T.S. Messerges, E.A. Dabbish, and R.H. Sloan, Examining Smart-Card Security under the Threat of Power Analysis Attacks, IEEE Trans. Computers, vol. 51, no. 5, pp. 541-552,May 2002. [21] Y. Dodis, L. Reyzin, and A. Smith, Fuzzy Extractors: How to Generate Strong Keys from Biometrics and Other Noisy Data, Proc. Intl Conf. Theory and Applications of Cryptographic Techniques (Eurocrypt), pp. 523-540, 2004. [22] N.K. Ratha, J.H. Connell, and R.M. Bolle, Enhancing Security and Privacy in Biometrics-Based Authentication Systems, IBM Systems J., vol. 40, no. 3, pp. 614634, 2001. [23] M.-H. Lim and A.B.J. Teoh, Cancelable Biometrics, Scholarpedia, vol. 5, no. 1, p. 9201, 2010. [24] H. Tian, X. Chen, and Y. Ding, Analysis of Two Types Deniable Authentication Protocols,
An Optimized Workflow Composition Through Ontology Based Planning
Mannar Mannan J Research Scholar, Department Of Information Technology Anna University of Technology, Coimbatore, Tamil Nadu, India 641 047 Abstract
Praveen Kumar S PG Scholar, Department Of Information Technology Anna University of Technology, Coimbatore, Tamil Nadu, India 641 047
The one of the main challenges of the advanced knowledge engineering techniques is to efficiently extract relevant information from large amounts of data from different sources. The aim of this paper is use a genetic algorithm to perform an automatic matching process capable of computing a suboptimal alignment between two ontologies. To achieve this aim, the ontology alignment problem has been formulated as a minimum optimization problem characterized by an objective function depending on a fuzzy similarity. Here we are presenting an approach for automated construction of knowledge discovery workflows, given the types of inputs and the required outputs of the knowledge discovery process. We also present a workflow optimization for this problem of knowledge discovery ontology. As initially we define a formal conceptualization of knowledge types and for finding the frequent patterns using Apriori association rule mining algorithm by means of knowledge discovery ontology. Then the workflow composition is formalized as a planning task using the ontology of domain and task descriptions. Finally we propose an optimization of the workflows with the system using the genetic algorithm. We also evaluate the performance of the proposed approach with the existing techniques and the experimental results shows that our proposed approach shows that this approach performs well than the existing.
I. INTRODUCTION DATAMINING is the Knowledge discovery in databases process, or KDD is relatively young and interdisciplinary field of computer science is the process of discovering new patterns from large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics& database systems. The goal of data mining is to extract knowledge from a data set in a humanunderstandable structure. Data mining is the entire process of applying computer-based methodology, including new techniques for knowledge discovery, from data. Databases, Text Documents, Computer Simulations, and Social Networks are the Sources of Data for Mining. Data mining involves six common classes of tasks are Anomaly detection, Association rule learning, Clustering, Classification, Regression and Summarization.
Knowledge management (KM) comprises a range of strategies and practices used in an organization to identify, create, represent, distribute, and enable adoption of insights experiences. Such insights and experiences comprise knowledge, either embodied in individuals or embedded in organizations as processes or practices. More recently, other fields have started contributing to KM research; these include information and media, computer sciences, public health, public policy.
The KM Process Framework

Knowledge Capture Knowledge Access
Knowledge Base
Knowledge Creation Knowledge Application
Knowledge Acquisition
Knowledge Sharing
Knowledge Transfer
Knowledge Networking
Fig 1: General knowledge management process The term knowledge discovery workflow allows a wide scope of interpretations. For this work, we essentially define it as a progression of steps (inductive, deductive, format-conversion procedures etc.) involved in generalizing specific data (e.g., measurements) into patterns, which, under appropriate interpretation, may represent novel knowledge about the problem domain under investigation. Therefore, it can be viewed as a special form of scientific workflows, covering the data preparation and modeling stages of the standard CRISP-DM data mining methodology. The primary objective of this study is to investigate whether such complex workflows can be assembled automatically with the use of knowledge discovery ontology and a planning algorithm accepting task descriptions automatically formed using the vocabulary of the ontology. To achieve this objective, we have developed and present a knowledge discovery ontology capturing complex background knowledge and relational data mining algorithms. We have developed a planner using standard PDDL descriptions of algorithms generated automatically from the ontology as a base line approach to demonstrate that the algorithm descriptions in the knowledge discovery ontology are suitable for planning.
The main contributions of this proposed system are, initially we conceptualize the knowledge discovery domain and follow up presently on emerging research attempting to establish a unifying theory of data mining Apriori ARM algorithm. We built upon the definitions of core knowledge discovery concepts presented in designing the core parts of the ontology, namely the concepts of knowledge, representation language, pattern, dataset, evaluation, and further more specialized concepts. Then Secondly, we want to apply the genetic algorithm for classification process. Our methodology bridges the gap by providing a working prototype of an actionable data mining conceptualization including learning from structured and relational data, enabling automated assembly of knowledge discovery workflows. The remainder of this paper is organized as follows. Section II briefly surveys the different classification process and knowledge discovery methods. An overall description of the existing system is presented in Section III. The recent our proposed system is described in Sections IV and The preliminary results with regard to the analysis of classification techniques are defined in Section V. After discussing our results, we conclude in Section VI. II.RELATED WORK Luc De Raedt (2002) presents A perspective on inductive databases. Inductive databases tightly integrate databases with data mining. The key ideas are that data and patterns are handled in the same way and that an inductive query language allows the user to query and manipulate the patterns (or models) of interest. This paper proposes a simple and abstract model for inductive databases. We describe the basic formalism, a simple but fairly powerful inductive query language, some basics of reasoning for query optimization, and discuss some memory organization and implementation issues. Nada Lavrac, Filip Zelezn, and Peter A. Flach (2003) are proposed RSD: Relational Subgroup Discovery through First Order Feature
Construction. Relational rule learning is typically used in solving classification and prediction tasks. However, relational rule learning can be adapted also to subgroup discovery. This paper proposes a propositionalization approach to relational subgroup discovery, achieved through appropriately adapting rule learning and first-order feature construction. The proposed approach, applicable to subgroup discovery in individualcentered domains, was successfully applied to two standard ILP problems (East-West trains and KRK) and a real-life telecommunications application. Peter Mika et al (2004) presented the Foundations for Service Ontologies: Aligning OWL-S to DOLCE. This paper is especially important for the description of Web Services, which should enable complex tasks involving multiple agents. As one of the first initiatives of the Semantic Web community for describing Web Services, OWL-S attracts a lot of interest even though it is still under development. They identify problematic aspects of OWL-S and suggest enhancements through alignment to a foundational ontology. Another contribution of our work is the Core Ontology of Services that tries to fill the epistemological gap between the foundational ontology and OWL-S. It can be reused to align other Web Service description languages as well. Monika Zakova et al (2007) designed the paper for Relational Data Mining Applied to Virtual Engineering of Product Designs. The ultimate goal of this work is to achieve design process improvements by applying state-of-the-art ILP systems for relational data mining of past designs, utilizing commonly agreed design ontologies as background knowledge. This paper demonstrates the utility of relational data mining for virtual engineering of product designs through the detection of frequent design patterns, enabled by the proposed baseline integration of hierarchical background knowledge using sorted refinements. Stankovski et al (2008) proposed the Gridenabling data mining applications with DataMiningGrid: An architectural perspective. The DataMiningGrid system has been designed to meet the requirements of modern and distributed data mining scenarios. Based on the Globus
Toolkit and other open technology and standards, the DataMiningGrid system provides tools and services facilitating the grid-enabling of data mining applications without any intervention on the application side. Critical features of the system include flexibility, extensibility, scalability, efficiency, conceptual simplicity and ease of use. The system has been developed and evaluated on the basis of a diverse set of use cases from different sectors in science and technology. D. F. Llorca et al (2009) improved pedestrian detection by compensation of the cameras pitch angle for both collision-avoidance and collision mitigation applications. To that effect, two pitch compensation methods have been developed and compared. Real experiments have been carried out for collision avoidance and mitigation. Collision avoidance is performed by means of deceleration strategies whenever the accident is avoidable. Likewise, collision mitigation is accomplished by triggering an active hood system. The collision avoidance module has been tested on a Citroen C3 Pluriel car equipped with a stereovision system. Tests were carried out on private circuits with actors. The collision mitigation module was mounted on a Seat Cordoba car equipped with an active hood system that is triggered by the stereovision system. Katharina morik, Martin scholz et al presents although preprocessing is one of the key issues in data analysis, it is still common practice to address this task by manually entering SQL statements and using a variety of stand-alone tools. The results are not properly documented and hardly re-usable. The Mining Mart system presented in this chapter focuses on setting up and re-using best-practice cases of preprocessing data stored in very large databases. A meta-data model named M4 is used to declaratively define and document both, all steps of such a preprocessing chain and all the data involved. For data and applied operators there is an abstract level, understandable by human users, and an executable level, used by the meta-data compiler to run cases for given data sets. An integrated environment allows for a rapid development of preprocessing chains. Case adaptation to different environments is supported by just specifying all involved database entities in the target DBMS. This allows reusing bestpractice cases published on the Internet.
III. GENERATION OF AUTOMATIC KNOWLEDGE DISCOVERY ONTOLOGY The ontology defines relationships among the ingredients of knowledge discovery scenarios, both declarative (various knowledge representations) and algorithmic. The primary purpose of the ontology is to enable the workflow planner to reason about which algorithms can be used to produce intermediary or final results required by a specified data mining task. In this paper we focus on automatic construction of abstract workflows. Each generated abstract workflow is stored as an instance of the class and can be instantiated with a specific algorithm configuration either manually or using a predefined default configuration. We treat the automatic workflow construction as a classical planning task, in which algorithms represent operators and their required input and output knowledge types represent preconditions and effects. Both the information about the available algorithms and knowledge types as well as the specification of the knowledge discovery task is encoded through ontology. At the same time, we want to be compatible with established planning standards.
knowledge. For example, first order logic theories consist of formulas. Similarly, the common notion of a dataset corresponds either to a set of attributevalue tuples or to a set of relational structures, each of which describes an individual object. This structure is accounted for through the predicate , so, e.g., a first-order theory a set of first-order formulas. Moreover, some knowledge types may be categorized according to the expressivity of the language in which they are encoded. For this purpose, we have designed a hierarchy of language expressivity, of which Fig. 2 shows a fraction. The hierarchy is an acyclic directed graph, however, for better readability only tree structure is shown in Fig. 2. The notion of an algorithm involves all executable routines that can be used in a knowledge discovery process, like inductive algorithms and knowledge format transformations. Any algorithm turns a knowledge instance into another knowledge instance. In order to formalize the problem description and for storing the created workflows in a knowledge-based representation, we have created a small ontology for workflows, which extends the KD ontology. IV. AUTOMATIC WORKFLOWS CONSTRUCTION USING GENETIC ALGORITHM In this proposed system first we are using the Apriori algorithm for finding the frequent patterns. Then the second step is to define genetic algorithm for automatic workflow composition. The main aim of this proposed system is to automatic knowledge workflow composition using the genetic algorithm. A. Construction of sales ontology There are two steps to create ontology, define concepts; define concepts relationships also called axiom. In this paper, the definition of all concepts and relations have been shown and described; here we just study how to formalize our business. The main formalization process is to identify the disjoint classes and related object properties. The concept in same layer and same level are pairwise disjoint and for the direct
Fig 2: A part of the expressivity hierarchy in the Protg ontology editor. In data mining, many knowledge types can be regarded as sets of more elementary pieces of
relationship with the concept object properties and inverse properties should be created. To formalize the strategic layer, elements strategy and its related axioms can be formalized as OWL. B. Knowledge discovery from ontology using Apriori algorithm The most representative association rule algorithm is the Apriori algorithm, which was proposed by Agrawal et al. in 1993. The Apriori algorithm repeatedly generates candidate itemsets and uses minimal support and minimal confidence to filter these candidate itemsets to find highfrequency itemsets. Association rules can be figured out from the high-frequency itemsets. The notion of an algorithm involves all executable routines that can be used in a knowledge discovery process, like inductive algorithms and knowledge format transformations. Any algorithm turns a knowledge instance into another knowledge instance. For example, inductive algorithms will typically produce a Patternset or model instance out of a Dataset instance. Of importance are also auxiliary representation changers, transforming datasets to other datasets. These may be simple format converters (e.g., only changing the separator character in a textual data file), or more complex transformations characterized by information loss. This may be incurred either due to a conversion into a language class with lower expressiveness (e.g., for propositionalization algorithms) or even without expressiveness change (e.g., for principal component representation of real vectors). C. Automatic Workflow Construction Both the information about the available algorithms and knowledge types as well as the specification of the knowledge discovery task is encoded through ontology. At the same time, we want to be compatible with established planning standards. We use baseline approach consists of generating a description of the domain and the problem description in the PDDL language using elements of the KD ontology and implementing a planning algorithm, which uses PDDL descriptions. We use PDDL 2.0 with type hierarchy and domain axioms. Planning algorithms require two main inputs. The first one is the description of the domain specifying the available
types of objects and actions. The second one is the problem description specifying the initial state, goal state and the available objects. We have implemented a procedure for generating the domain description from the KD ontology. D. Ontology alignment using genetic algorithm The alignment process can be opportunely restricted in order to deliver a particular kind of alignment useful for a particular kind of applications. In particular, in our approach, the ontology alignment problem is depending upon the following features: A simplified definition of mapping element, renamed correspondence which implies the equality as the relation. cardinality :(many-to-one), i.e., an entity of the first ontology can be associated with an only entity of the second one, whereas, an entity of the second ontology can be associated also with more entities of the first one. In order to formulate the considered alignment problem as optimization problem, it is necessary to introduce the concept of the evaluation of a correspondence. The evaluation is performed by a method whose aim is to determine the goodness of a correspondence to achieve an optimal alignment. In our work, a fuzzy similarity between the name labels of the entities composing a correspondence is chosen to realize the evaluation method. Precisely, the goodness of a correspondence is evaluated through a fuzzy relation between entity names computed by means of the well-known levenshtein string distance. E. Evaluating workflow execution Empirical tests should thus primarily serve as a proof of concept, showing that the approach scales, with acceptable computational demands, to reasonably large real-life problem instances. We have conducted workflow construction experiments in sales domain. The workflows pertaining to both of the use cases are required to merge data with nontrivial relational structure, including ontology background knowledge. Again, this setting precludes the application of previous workflow construction systems, limiting the scope for comparative evaluation. However, we do run comparative experiments to evaluate the effects of
employing either of the two earlier described planning strategies.
1:
0; ); // of
2: generatePopulation( Generate randomly an initial population chromosomes
3: evaluateFitness( ); // Evaluate fitness value for each chromosome 4: getBestChromosome( ); // Select the best chromosome of the current population 5: evaluateAlignment( ); // contains the f-measure value for the current best chromosome 6: while ( ) do AND
7: executeCrossover( , ); // Crossover chromosomes according to a crossover rate to add new chromosomes to population 8: executeMutation( , ); // Mutate chromosomes with a mutation probability to add new chromosomes to population Fig 3: Architecture diagram of proposed work The Fig 3 defines the overall proposed work that is the automatic knowledge discovery workflow composition through ontology based planning using the genetic algorithm. Advantages of proposed system are Construction of work flows possible in two domains, the workflows generated by our algorithm were complex, but reasonable and a rich ontological representation could easily lead to a combinatorial explosion during planning. F. Pseudo code for Genetic algorithm Input: entity name string sets 1 and 2 of two ontologies 1 and 2 to align; the number of the entities 1 and 2 of the ontologies; GA parameters (size of the population , crossover rate , mutation rate ), temination criteria (maximum number of iterations and desidered f-measure value ); local search parameters (maximum number of iterations ). Output: the best optimized alignment (represented by the final best chromosome) between the ontologies 1 and 2. 9: evaluateFitness( ); // Evaluate fitness value for new chromosomes 10: executeSelection( , ); // Select chromosomes to generate next new population 11: getBestChromosome( ); // Select the best chromosome of the current population 12: preserveBestChromosome( , ); // Preserve the best chromosome of the current population: Elitist Strategy 13: executeHillClimbingSearch( , ); // Execute the local search process on the best chromosome 14: replaceWorstChromosome( , ); // Select the worst chromosome and replace it with that obtained by the local search process 15: iterations 16: + 1; // Increment number of
evaluateAlignment( ); // contains the f-measure value for the current best chromosome 17: end while 18: return .
DRDO Sponsored International Conference on Intelligence Computing (ICONIC12) V.EXPERIMENTAL RESULTS In this graph 1(fig 1) the classification process using JRip algorithm is compared with the classification algorithm using genetic algorithm. The execution time is calculated here for the workflow size. Thus the above graph defines that the proposed method having the minimum execution time than the existing method. Fig 1: Graph 1
are specialized for one domain only. Thus the proposed method was more automatic and it is executed with better result efficiency than the existing method. In future work, we plan to extend the ontology by descriptions of available computational resources (such as in a GRID environment). We also want to extend modeling of constraints on the algorithms and workflows and to align the ontology to a top-level ontology. Furthermore, we want to introduce more complex heuristics for evaluating the workflows and metrics for workflow similarity and focus on planners more tightly integrating the planner with a reasoner. VII.REFERENCES [1] Workflows for e-Science, Scientific Workflows for Grids, I. Taylor, E. Deelman, D. Gannon, and M. Shields, Eds.. New York: Springer, 2007. [2] I. Trajkovski, F. Zelezn, N. Lavrac, and J. Tolar, Learning relational descriptions of differentially expressed gene groups, IEEE Trans. Syst. Man, Cybern. C, vol. 38, no. 1, pp. 1625, Jan. 2008. [3] M. Zkov, F. Zelezn, J. A. GarciaSedano, C. Massia-Tissot, N. Lavrac, P. Kremen, and J. Molina, Relational data mining applied to virtual engineering of product designs, in Proc. 16th Int. Conf. Inductive Logic Programming, 2006, pp. 439453. [4] Q.Yang and X.Wu, 10 challenging problems in data mining research, Intl. J. Inf. Tech. Decision Making, vol. 5, no. 4, pp. 597604, 2006.
In this graph 2(fig 2) the classification process using JRip algorithm is compared with the classification algorithm using genetic algorithm. The no of successful executions are calculated here for the workflow size. Thus the above graph defines that the proposed method having the maximum no of executions than the existing method. Fig 2: Graph 2
VI.CONCLUSION & FUTURE WORK Thus the primary objective of this study was to investigate whether complex scientific and engineering knowledge discovery workflows, are to be automatic. We have developed a methodology for automatic composition of abstract workflows, which are proposed to the user and can be instantiated interactively using the genetic algorithm. Our methodology focuses on workflows for complex knowledge discovery tasks dealing with structured data and background knowledge, while the previous studies deal only with classical propositional data mining tasks or
[5] S. Dzeroski, Towards a general framework for data mining, in Proc. 5th Int. Workshop, Knowledge Discovery in Inductive Databases, KDID06, 2007, vol. 4747, LNCS, pp. 259300. [6] P. Patel-Schneider, P. Hayes, and I. Horrocks, OWL web ontology language semantics and abstract syntax,W3C recommendation, 2004. Available: http://www.w3.org/TR/owl-semantics [7] D. Smith and D.Weld, Temporal planning with mutual exclusion reasoning, in Proc. 1999 Int. Joint Conf. Artif. Intell. (IJCAI-1999), 1999, pp. 326333.
[8] L. DeRaedt, A perspective on inductive databases, SIKDD Explorations, vol. 4, no. 2, pp. 6977, 2002. [9] V. Stankovski, M. Swain, V. Kravtsov, T. Niessen, D.Wegener, J. Kindermann, and W. Dubitzky, Grid-enabling data mining applications with DataMiningGrid: An architectural perspective, Future Generation Comput. Syst., vol. 24, no. 4, pp. 259279, 2008. [10] Relational Data Mining, S. Dzeroski and N. Lavrac, Eds. New York: Springer, 2001. [11] I. Taylor, M. Shields, I. Wang, and A. Harrison, The Triana workflow environment: Architecture and applications, in Workflows for e-Science, I. Taylor, E. Deelman, D. Gannon, and M. Shields, Eds. New York: Springer, 2007, pp. 320339. [12] A. Rowe, D. Kalaitzopoulos, M. Osmond, M. Ghanem, and Y. Guo, The Discovery Net system for high throughput bioinformatics, Bioinformatics, vol. 19, pp. 225231, 2003. [13] N. L. Khac, M. T. Kechadi, and J. Carthy, Admire framework: Distributed data mining on data grid platforms, in Procs. 1st Int. Conf. Softw. Data Technol., 2006, vol. 2, pp. 6772. [14] A. Ali, O. Rana, and I. Taylor, Web services composition for distributed data mining, in Proc. 2005 IEEE Int. Conf. Parallel Processing Workshops, ICPPW05, 2005, pp. 1118. [15] D. DeRoure, C. Goble, and R. Stevens, The design and realisation of the myExperiment virtual research environment for social sharing of workflows, Future Gen. Comput. Syst., vol. 25, pp. 561567, 2008. [16] K. Morik and M. Scholz, The MiningMart approach to knowledge discovery in databases, in Proc. Int. Conf. Machine Learning, 2004, pp. 47 65. [17] A. Suyama, N. Negishi, and T.Yamagchi, Composing inductive applications using ontologies for machine learning, in Proc. 1st Int. Conf. Discovery Sci., 1998, pp. 429431. [18] R. Wirth, C. Shearer, U. Grimmer, T. P. Reinartz, J. Schloesser, C. Breitner, R. Engels, and G. Lindner, Towards process-oriented tool support for knowledge discovery in databases, in Proc. 1st Eur. Symp. Principles of Data Mining
and Knowledge Discovery, 1997, vol. 1263, pp. 243253. [19] A. Bernstein and M. Deanzer, The NExT system: Towards true dynamic adaptions of semantic web service compositions (system description), in Proc. 4th Eur. Semantic Web Conf. (ESWC07)), 2007, vol. 4519, LNCS, pp. 739748. [20] A. Congiusta, D. Talia, and P. Trunfio, Distributed data mining services leveraging WSRF, Future Gen. Comput. Syst., vol. 23, no. 1, pp. 3441, 2007.
PREPARING DATA SETS FROM DATABASES USING HORIZONTAL AGGREGATIONS WITH HOLISTIC FUNCTIONS
Mannar Mannan J Research Scholar, Department Of Information Technology Anna University of Technology, Coimbatore, Tamil Nadu, India 641 047 Karthik M PG Scholar, Department Of Information Technology Anna University of Technology,
Abstract Database applications have learned a new meaning after the advent of the internet, in the last decade. Web applications render vast numbers of data of unusual forms, and render vast amounts of requests against that data. a lot of conventional database and programming proficiencies were not designed to handle on these data management challenges. In present system gets a class of aggregate functions that combine numeric formulations and interchange results to develop a data set with a crosswise layout. Functions belonging to this class are known as horizontal aggregations. Horizontal aggregations represent an extended form of conventional sql aggregations, which give back a set of values in a horizontal layout, besides a single value per row. This system explains how to evaluate and optimize horizontal aggregations generating standard sql code. In our projecting system introduce new functions for horizontal aggregations. The functions are holistic functions. It provides more opportunities for the query optimizers to find optimal plans because all possible placements of the group by operators in the query trees are considered during the optimization process. For queries with the non-distributive aggregate functions, the evaluation of the group by operators has to wait until the entire input is formed since these aggregate functions cannot be decomposed into sub-aggregate functions and their computations depend upon the entire set of the input. By expanding the ability of the early grouping method to handle aggregate queries with holistic aggregate functions, these functions provide the optimizers with more chances to find optimal plans. I. INTRODUCTION With the advent of the internet, in the last decade, database applications have acquired a new meaning. Internet applications generate huge amounts of data of different kinds, and generate huge numbers of requests against that data. a lot of traditional database and programming techniques were not designed to deal with these data management challenges. In a relational database, especially with normalized tables, a significant effort is required to prepare a summary data set that can be used as input for a data mining or statistical algorithm. Most algorithms require as input a data set with a horizontal layout, with several records and one variable or dimension per column. That is the case with models like clustering, classification, regression and PCA; consult. Each research discipline uses different terminology to describe the data set. In data mining the common terms are point-dimension, Statistics literature generally uses observation-variable, Machine learning research uses instance-feature. The main reason is that, in general, data sets that are stored in a relational database come from OnLine Transaction Processing (OLTP) systems where database schemas are highly normalized. But data mining, statistical or machine learning algorithms generally require aggregated data in summarized form. Based on current available functions and clauses in SQL, a significant effort is required to compute aggregations when they are desired in a cross tabular form, suitable to be used by a data mining algorithm. Such effort is due to the amount and complexity of SQL code that needs to be written, optimized and tested. There are further practical reasons to return aggregation results in a horizontal layout.
Organizing a data set for analysis is almost time exhausting task in a data mining project. Data translation demand more time and troubles .hence aggregation too have got certain restrictions .Holistic function can be implemented with the horizontal layout from that approach mean and median in the aggregate function can get that and In this work aggregation limitations can be overcome by using horizontal aggregation layouts. This horizontal aggregation can be achieved using three methods SPJ, CASE and PIVOT method the first one (SPJ) relies on standard relational operators. The second one (CASE) relies on the SQL CASE construct. The third (PIVOT) uses a built-in operator in a commercial DBMS that is not widely available. Both CASE and PIVOT evaluation methods are significantly faster than the SPJ method A holistic measure is a measure that must be computed on the entire data set as a whole. It cannot be computed by partitioning the given data into subsets and merging the values obtained for the measure in each subset.. An aggregate function is holistic if there is no constant bound on the storage size needed to describe a sub aggregate. That is, there does not exist an algebraic function with M arguments (where M is a constant) that characterizes the computation. Common examples of holistic functions include median (), mode (), and rank ().A ensure is holistic if it is obtained by applying a holistic aggregate function A. Advantages Our suggested horizontal aggregations with holistic function allow a lot of unique features and advantages. First, they represent a template to give SQL code from a data mining tool. Such SQL code automates writing SQL queries, optimizing them and testing them for correctness. This SQL code cuts manual work in the data planning phase in a data mining project. Second, since SQL code is automatically generated it is likely to be more efficient than SQL code written by an end user. For instance, a person who doesn't know SQL. Third, the data set can be made completely inside the database management system. In modern database environments it is common to export denormalized data sets to be further cleaned and transformed outside a DBMS in external tools (e.g. statistical packages). Unfortunately,
exporting large tables outside a DBMS is slow, creates inconsistent copies of the same data and compromises database security. Hence, we provide a more effective, better integrated and less troubled solution compared to external data mining tools. Horizontal aggregations with holistic function just require a small syntax extension to aggregate functions called in a SELECT statement. Alternatively, horizontal aggregations can be used to generate SQL code from a data mining tool to build data sets for data mining analysis B. Article Organization The arrangement of this paper is as follows. In section 2, we introduce related works. in section 3 some of related background works are given In section 4 system architecture, and in section 5 and conclusions are formulated in Section 6. II. RELATED WORKS Carlos (2011) [1] proposed an abstract, but minimal, extension to SQL standard aggregate functions to compute horizontal aggregations which just requires specifying sub grouping columns inside the aggregation function call We introduced a new class of extended aggregate functions, called horizontal aggregations which help preparing data sets for data mining and OLAP cube exploration. Basically, a horizontal aggregation returns a set of numbers instead of a single number for each group, resembling a multi-dimensional vector The two research issues discussed here is 1.We need to understand if horizontal aggregations can be applied to holistic functions (e.g. rank()).2.We want to study optimization of horizontal aggregations processed in parallel in a shared-nothing DBMS architecture Andy S. Chiou [2]The early grouping technique is a new method for optimizing aggregate queries. It provides more opportunities for the query optimizers to find optimal plans because all possible placements of the GROUP BY operators in the query trees are considered during the optimization process more sub-functions that are distributive. With this new technique, the query optimizers are able to consider larger scope of query plans, which results in more opportunities to find more optimal plans. C. Ordonez (2010) [3] Statistical models are generally computed outside a DBMS due to
their mathematical complexity. They introduce techniques to efficiently compute fundamental statistical models inside a DBMS exploiting User-Defined Functions (UDFs). Specifically, they study the computation of linear regression, PCA, clustering, and Naive Bayes. Two summary matrices on the data set are mathematically shown to be essential for all models: the linear sum of points and the quadratic sum of cross products of points. They consider two layouts for the input data set: horizontal and vertical. They first introduce efficient SQL queries to compute summary matrices and score the data set. Cross products can be ignored for clustering and Bayesian classifiers, yielding a faster UDF that computes a diagonal matrix. They carefully studied two layouts for the data set: a horizontal one having dimensions as columns and a vertical one having one dimension value per row. The vertical layout presents no limitations for dimensionality and can be more efficient to analyze sparse matrices. They proposed two sets of UDFs: an aggregate UDF that computes summary matrices for all models and a set of scalar UDFs implementing primitive vector operations, used to score data sets based on a model. Two programming alternatives to compute sufficient statistics were discussed: SQL queries and aggregate UDFs. For each alternative, they introduced solutions for the horizontal and vertical layouts. They then presented a set of scalar UDFs to score data sets in a single pass based on linear regression, PCA, clustering, and Naive Bayes. Disadvantage There are two important disadvantages for this SQL statement: it can easily exceed the limits of the DBMS and Q entries cannot be accessed by subscript, but by column names. C. Ordonez (2006) [4] Integrating data mining algorithms with a relational DBMS is an important problem for database programmers. They introduce three SQL implementations of the popular K-means clustering algorithm to integrate it with a relational DBMS: 1) a straightforward translation of K-means computations into SQL, 2) an optimized version based on improved data organization, efficient indexing, sufficient statistics, and rewritten queries, and 3) an incremental version that uses the optimized version as a building block with fast convergence and automated reseeding. They
experimentally show the proposed K-means implementations work correctly and can cluster large data sets. They identify which K-means computations are more critical for performance. They focus on integrating the K-means clustering algorithm with a relational DBMS using SQL that is nowadays the standard language in relational databases. Clustering algorithms partition a data set into several groups such that points in the same group are close (similar) to each other and points across groups are far (different) from each other. Having a clustering algorithm implemented in SQL provides many advantages. SQL is available in any relational DBMS. SQL isolates the application programmer from internal mechanisms of the DBMS. Many data sets are stored in a relational database. Trying different subsets of data points and dimensions is more flexible, faster, and, generally, easier to do inside a DBMS with SQL queries than outside with alternative tools. Managing large data sets without DBMS support can be a daunting task. It presents three implementations of Kmeans clustering in SQL to integrate it with a relational DBMS. The proposed implementations allow clustering large data sets stored inside a relational DBMS eliminating the need to export data. Only standard SQL was used; no special extensions for data mining were needed. They concentrated on defining suitable tables, indexing them, and optimizing queries for clustering purposes. The first implementation is a straightforward translation of K-means computations into SQL, which serves as a framework to build a second optimized version with superior performance. The optimized version is then used as a building block to introduce an incremental K-means implementation with fast convergence and automated reseeding. The first implementation is called Standard K-means, the second one is called Optimized K-means, and the third one is called Incremental K-means. Experiments evaluate correctness and performance with real and synthetic data sets. Disadvantage: Implementing a clustering algorithm in SQL presents important drawbacks. SQL is not as efficient and flexible as a high-level programming language like C++. SQL has serious limitations to perform complex mathematical operations because, in general,
SQL does not provide arrays and functions to manipulate matrices. C. Cunningham(2004)[5] PIVOT and UNPIVOT, two operators on tabular data that exchange rows and columns, enable data transformations useful in data modeling, data analysis, and data presentation. They can quite easily be implemented inside a query processor, much like select, project, and join. Such a design provides opportunities for better performance, both during query optimization and query execution. Inclusion of Pivot and Unpivot inside the RDBMS enables interesting and useful possibilities for data modeling. Existing modeling techniques must decide both the relationships between tables and the attributes within those tables to persist. The requirement that columns be strongly defined contrasts with the nature of rows, which can be added and removed easily. Pivot and Unpivot, which exchange the role of rows and columns, allow the a priori requirement for pre-defined columns to be relaxed. While the conceptual model for PIVOT and UNPIVOT is straightforward, several important details must be further defined to operate well with existing SQL constructs. One problem that must be addressed is how to handle data collisions (two values mapping to the same location). Missing values is the opposite condition, and behavior must also be defined for this case. Finally, the use of PIVOT and UNPIVOT on dynamic (open) schemas must be addressed. Any Pivot and Unpivot definitions must handle these semantic issues. They introduce two new data manipulation operators, Pivot and Unpivot, for use inside the RDBMS. These improve many existing user scenarios and enable several new ones. Furthermore, this paper outlines the basic syntactic, semantic, and implementation issues necessary to add this functionality to an existing RDBMS based on. III.BACKGROUND WORKS A. Implementation of horizontal aggregations In this module we bring in a new class of aggregations that have got similar behavior to SQL standard aggregations, but which produce tables with a horizontal layout. In contrast, we
call standard SQL aggregations vertical aggregations since they acquire tables with a vertical layout. Horizontal aggregations simply need a belittled syntax extension to aggregate functions called in a SELECT command. Instead, horizontal aggregations can be used to get SQL code from a data mining tool to build data sets for data mining analysis. The basic goal of a horizontal aggregation is to transpose the aggregated column by a column subset. We start by explaining how to automatically generate SQL code. These aggregations preserve evaluation semantics of standard SQL aggregations. The main difference will be delivering a table with a horizontal layout, possibly having extra nulls. Observe each horizontal aggregation effectively returns a set of columns as result and there is call to a standard vertical aggregation without any sub grouping columns. For the first horizontal aggregation we show day names and for the second one we show the number of day of the week. These columns can be used for linear regression, clustering or factor analysis. B. Implementation of SPJ method In this module we implements the first method relies only on relational operations. That is, only doing select, project, join and aggregation queries; we call it the SPJ method. The SPJ method is interesting from a theoretical point of view because it is based on relational operators only. The basic idea is to create one table with a vertical aggregation for each result column, and then join all those tables to produce FH. We aggregate from F into d projected tables with d Select- Project-Join Aggregation queries (selection, projection, join, aggregation). Each table FI corresponds to one sub grouping combination and has {L1. Lj}as primary key and an aggregation on A as the only non-key column. In general, nulls should be the default value for groups with missing combinations. We believe it would be incorrect to set the result to zero or some other number by default if there is no qualifying rows. Such approach should be considered on a per-case basis. C. Implementation of CASE method In this module we introduces the second form relies on the SQL case construct; we call it
the CASE method. Each table has an index on its primary key for efficient join processing. We do not consider additional indexing mechanisms to accelerate query evaluation. For this method we use the case programming construct available in SQL. The case statement returns a value selected from a set of values based on Boolean expressions. From a relational database theory point of view this is equivalent to doing a simple projection/aggregation query where each nonkey value is given by a function that returns a number based on some conjunction of conditions. We propose two basic sub-strategies to compute FH. In a similar manner to SPJ, the first one directly aggregates from F and the second one computes the vertical aggregation in a temporary table FV and then horizontal aggregations are indirectly computed from FV. We now present the direct aggregation method. Horizontal aggregation queries can be evaluated by directly aggregating from F and transposing rows at the same time to produce FH. First, we need to get the unique combinations of R1. . .Rk that define the matching Boolean expression for result columns. The SQL code to compute horizontal aggregations directly from F is as follows. Observe V () is a standard SQL aggregation that has a case statement as argument. Horizontal aggregations need to set the result to null when there are no qualifying rows for the specific horizontal group to be consistent with the SPJ method and also with the extended relational model. D. Implementation of PIVOT method In this module we describes the third method uses the built-in PIVOT operator, which transforms rows to columns. We consider the PIVOT operator which is a built-in operator in a commercial DBMS. Since this operator can perform transposition it can help evaluating horizontal aggregations. The PIVOT method internally needs to determine how many columns are needed to store the transposed table and it can be combined with the GROUP BY clause. the pivot operator to compute a horizontal aggregation and assuming one by column for the right key columns. The SQL PIVOT operator works in a similar manner to the CASE method. We consider the optimized version of PIVOT, where we project only the
columns required by FH (i.e. trimming F). When the PIVOT operator is applied one aggregation column is produced for every distinct value vj , producing the d desired columns. We consider the optimized version which trims F from irrelevant columns and k = 1. Like the SPJ and CASE methods, PIVOT depends on selecting the distinct values from the right keys R1 . . . Rk. It avoids joins and saves I/O when it receives as input the trimmed version of F. Then it has similar time complexity to CASE. Also, time depends on number of distinct values, their combination and probabilistic distribution of values. The PIVOT operator was used as available in the SQL language implementation provided by the DBMS. E. Implementation of holistic functions In this module introduce new functions for horizontal aggregations. The functions are holistic functions. It provides more opportunities for the query optimizers to find optimal plans because all possible placements of the group by operators in the query trees are considered during the optimization process. For queries with the non-distributive aggregate functions, the evaluation of the group by operators has to wait until the entire input is formed since these aggregate functions cannot be decomposed into sub aggregate functions and their computations depend on the entire set of the input. By extending the ability of the early grouping technique to handle aggregate queries with holistic aggregate functions, these functions provide the optimizers with more opportunities to find optimal plans. However, when the aggregate functions are not distributive, the functions cannot be divided into one or more sub functions, the early grouping technique is not applicable in optimizing such queries. The evaluation of this type of functions, also called holistic aggregate function, is determined by the entire input and cannot be evaluated incrementally. With the aids of auxiliary functions, the optimizer can gather some information at the early stage during the optimization process. This information can later be used to reduce the cost of computing the holistic aggregate functions when all the input data are collected. 1. Holistic Measure
It is a measure that must be computed on the entire data set as a whole. It cannot be computed by partitioning the given data into subsets and merging the values obtained for the measure in each subset. The median is an example of a holistic measure 2. Median: We easily approximate the median value of a data set .Assume that data are grouped in intervals according to their xi data values and that the frequency (i.e., number of data values) of each interval is known. For example, people may be grouped according to their annual salary in intervals such as 1020K, 2030K, and so on. Let the interval that contains the median frequency be the median interval. We can approximate the median of the entire data set (e.g., the median salary) by interpolation using the formula:
Figure 1. Overall System architecture standard deviation.Here median is tested using the two methods they are order in sensitive and order sensitive this is explained in detail above V. EXPERIMENTAL RESULTS The holistic measure analysis in the horizontal aggregation can be achieved using the median function .in the median function we are using two types of techniques in order to find the median value in horizontal aggregation they are order in sensitive and order sensitive ,in order sensitive first we sort all the elements and then find the middle elements in that series .Then need to find middle elements in the order in sensitive in this approach we simply pick the middle elements .Now we have elements using that techniques we simply use average of that two elements that is the final median value. Before that check that median is finding from correct data elements this median function is not default available in the default aggregation here we took three parameter in the database from the northwind they are employee ID, customer ID and orders. This horizontal aggregation use customer ID and transfer that into horizontal values .First snap shows the horizontal table that is obtained from the order table available in the northwind database .It consist of many records the null values in the table can be represented using the dots.
L1 is the lower boundary of the median interval, N is the number of values in the entire data set, ( f req)l is the sum of the frequencies of all of the intervals that are lower than the median interval, f reqmedian is the frequency of the median interval, and width is the width of the median interval. IV.SYSTEM ARCHITECTURE This Fig.1 shows the process of our technique. choose the dataset need to do horizontal aggregations and then convert that using the three methods ,these three method use different logic to compute the horizontal aggregations. Implementation of these three methods are discussed below .holistic function can be applied in this horizontal aggregation ,using this we compute rank median and
manual work in the data preparation phase in a data mining project. SQL code is automatically generated it is likely to be more efficient than SQL code written by an end user and data sets can be created in less time. In future work we plan to develop more complete I/O cost models for cost-based query optimization. We want to study optimization of horizontal aggregations processed in parallel in a shared-nothing DBMS architecture. Cube properties can be generalized to multi-valued aggregation results produced by a horizontal aggregation. hence other holistic function like rank and standard deviation can be implemented in the horizontal aggregation RFERENCES [1] Carlos Ordonez, Zhibo Chen Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis University of Houston TX 77204, USA. IEEE Transactions on Knowledge and Data Engineering (TKDE), 2011. Andy S. Chiou, John C. Sieg Optimization for Queries with Holistic Functions C. Ordonez. Statistical model computation with UDFs. IEEE Transactions on Knowledge and Data Engineering (TKDE), 22, 2010. C. Ordonez. Integrating K-means clustering with a relational DBMS using SQL. IEEE Transactions on Knowledge and Data Engineering (TKDE), 18(2):188201, 2006. C. Cunningham, G. Graefe, and C.A. Galindo-Legaria. PIVOT and UNPIVOT: Optimization and execution strategies in an RDBMS. In Proc. VLDB Conference, pages 9981009, 2004. C. Ordonez. Horizontal aggregations for building tabular data sets. In Proc. ACM SIGMOD Data Mining and Knowledge Discovery Workshop, pages 3542, 2004 J. Gray, A. Bosworth, A. Layman, and H. Pirahesh. Data cube: A relational
Figure 2.horizontal aggregation The second snap shows the chart of CPU time needed to execute the three methods .First method take more time compare to other two methods this can be computed by using time needed to produce the result of horizontal aggregation .Pivot take less time to produce the horizontal aggregation table even after the median is implemented in the holistic function case method take less time when compare to spj method but more time compare to pivot method that can be described using this chart below In x axis it take category and y axis times in milliseconds.
[2]
[3 ]
[4]
[5]
[6] Figure 3.comparition chart VI. CONCLUTION AND FUTURE WORK In this above approach we developed an efficient method to calculate the median in the horizontal aggregation and it reduce the data preprocessing time .This SQL code reduces
[7]
aggregation operator generalizing group-by, cross-tab and subtotal. In ICDE Conference, pages 152159, 1996. [8] E.F. Codd. Extending the database relational model to capture more meaning. ACM TODS, 4(4):397434, 1979
ACCESSING RESTRICTED WEBSERVICES IN MOBILE PHONES USING BIOMETRICS

ABHIJITH.L, GALLA MAHESH REGHUNATH. N.K, PALPANDI.S, ASST.PROF Department Of Computer Science & Engineering Karpaga Vinayaka College of Engg & Tech.
Abstract
In this study, an application that allows a mobile phone to be used as a biometric-capture device is shown. The main contribution of our proposal is that this capture, and later recognition, can be performed during a standard web session, using the same architecture that is used in a personal computer (PC), thus allowing a multiplatform (PC, personal digital assistant (PDA), mobile phone, etc.) biometric web access. The review, which is from both an academic and commercial point of view, of the biometry and mobile device state of the art shows that in other related works, the biometric capture and recognition is either performed locally in the mobile or remotely but using special communication protocols and/or connection ports with the server. The second main contribution of this study is an in-depth analysis of the present mobile web-browser limitations; thus, it is concluded that, in general, it is impossible to use the same technologies that can be used to capture biometrics in PC platforms (i.e., Applet Java, ActiveX Control, JavaScript, or Flash); therefore, new solutions, as shown here, are needed. characteristic they are using [1]: physiological, which is based on direct measurements of a part of the human body (e.g., iris, fingerprint, face, hand shape, etc.), and behavioral, which is based on measurements and data derived from an action performed by the user and, thus, indirectly measuring some characteristics of the human body (e.g., voice, keystroke dynamics, signature-handwriting, gait, etc.). Biometricrecognition tasks can be split into two groups: identification (Who is the owner of this biometric?) and verification or authentication (Am I the person I claim to be?). Identification requires a large amount of processing, and is time consuming if the database is very large. It is often used to determine the identity of a suspect from crime-scene information. Verification requires less computer load as the user sample is only matched with a claimed identity-stored template and is often used to access places or information. Why use biometry? There are three general categories of user authentication: 1) something you know, e.g., passwords and personal-identification numbers (PINs), 2) something you have (e.g., tokens), and 3) something you are (e.g., biometrics) [1]. The dominant approach on current control access is via password or PIN, but its weaknesses are the most clearly documented: If it is easy to remember, it is usually easy to guess and
THIS paper focuses on the use of biometric person recognition for secure access to restricted data/services using a mobile phone with Internet connection. Many commercial and research efforts have recently focused on this subject (as discussed in Section II). However, in spite of the great amount of particular applications that can be found, the cost of changing or modifying biometric platforms, the lack of normalization in capture-device technology, and communication protocols, as well as social-acceptance drawbacks, are all barriers to the popularization of biometric recognition. There are four main questions that need to be answered for a better understanding of our proposal. 1) What is biometric person recognition? 2) Why use biometry? 3) Why use biometry in mobile phones/devices? 4) Why use web-based access? Let us begin by briefly answering these questions. What is biometric person recognition? This is the use of unique human characteristics (i.e., biometrics) to recognize the user. Biometrics can be divided into two categories based upon the underlying
1.INTRODUCTION
hack into, but if it is difficult to attack, it is usually difficult to remember; hence, a lot of people write them down and never change them. An interesting study on the problem of passwords from a commercial point of view can be found . The problem with tokens is that they authenticate their presence, but not the carrier; they can be easily forgotten, lost, or stolen, and, as it happens with the credit cards, can be fraudulently duplicated. As a result, biometry appears as a good solution, which is generally used, in addition to the previous authentication methods, to increase security levels. Another very well-known and important area of application is the one used by the police to identify suspects. Here, fingerprints and DNA are the mostcommonly used ones. Why use biometry in mobile phones/devices? Today, with the advancement of mobile handsets and wireless networking, mobile devices have both the network access and computing capacity to provide users with a diverse range of services (e.g, secure payment [3], e-banking [4], ecommerce (better: commerce [5]), etc.). According to the European Information Technology Observatory (EITO) (August 20091), the number of mobile phone users worldwide will exceed the 4 billion mark. Why use web-based access? It is a standard communication protocol. A lot of remote services are accessible via web (e.g., e-banking, e-commerce, e-mail, etc.). Only a web browser and internet connection are needed, which, at this moment, are available in different platforms: personal computers (PCs), laptops, NetBooks, personal digital assistants (PDAs), video-game consoles, and, of course, mobile phones. Therefore, web services can be accessed from different types of devices in the same way. This last point is an important goal of our proposal. The problem of capturing and sending the biometrics to the web server via PC is very easy to solve using embedded applications in the web pages as Applets Java, ActiveX controls, Java script, Flash technology, or Microsoft Silver light in our implementations, Applets have been used (see Section IV). However, due to the limitations of the devices, this solution is not possible in current mobile phones, as shown in Section III. Hence, a new solution is needed. The proposal of this study is to present a novel mobilephone application architecture to capture and send the biometric to the web server based on the use of an embedded web browser. The current mobile technology
is not ready for embedded applications in mobile web browsers; however, it is prepared for our solution, which is very easy and effective, as will be seen. II. RELATED WORKS/APPLICATIONS The majority of the works are proposals of biometric recognition systems adapted to mobile device limitations. Therefore, the recognition runs entirely on the device, i.e., there is no communication with a server. These studies focused on the template/model creation and matching (i.e., classification algorithms)parts of the biometric system (see Fig)
It is difficult to find an optimal biometric for practical applications, which is nonintrusive, easy, and secure to capture with good recognition performance. The use of several biometrics (i.e., multimodal biometric) may be a solution [12] and is an important fieldwork at present. Not many studies have been carried out on the use of mobile devices; some of them can be found in [13] (voice, face, and keystroke), [23] (face, voice, keystroke, and fingerprint), [14] (voice and face), and [24] (fingerprint and voice). Proprietary databases have been used to perform some of the previous studies, but public ones can also be found, for example, the Massachusetts Institute of Technology Mobile Device Speaker Verification Corpus [11] and Biosecure Multimodal Database (BMDB) Mobile Dataset (DS3), where mobile devices under degraded conditions were used to build this dataset; 2-D face, talking-face sequences (both indoor and outdoor), signature, and fingerprint being captured [25]. User recognition is usually performed just before accessing the controlled service (i.e., login time);
however, some authors propose the interesting concept of transparent authentication , i.e., to recognize the user during the run time, for example, while he/she is keystroke logging or writing a short message service (SMS), during a telephone conversation, during a video call, when the user is walking, etc. Theoretical proposals of practical applications can also be found. Clarkei et al. [13] showed a general clientserver nonintrusive and continuous-authentication (NICA) architecture. A NICA prototype was implemented, but the client was deployed in a laptop and an HP MiniNote, and a real mobile phone was not used (we, however, will show a prototype of our proposal for mobile). Similar proposals can be seen in previous works of the same authors, e.g., in [1]. Another interesting contribution that can be found related. The aim of the project was to integrate a biometric recognizer into a 3G/beyond 3G-enabled PDA to allow users to mutually recognize each other and securely authenticate messages (i.e., text or audio). This would enable them to legally sign binding contracts on their PDA/mobile phone. The biometric recognizer combines source-authentication methods on the basis of text-dependent speaker verification, video recordings of the speakers face, and a written signature. As in our proposal, the SecurePhone platform is entirely software based. This is important if it is to be adopted by device manufacturers as it keeps costs down and makes its implementation much easier. A database was recorded on a Qtek2020 PDA, which includes voice, face, and signature. The authentication data and the digital signature are stored on a subscriber identification module (SIM) card; therefore, a standalone topology is performed. Mobile Biometry (MOBIO).The MOBIO concept is to develop new mobile services secured by biometric authentication means. Scientific and technical objectives include robust-to-illumination face authentication, robust-to-noise speaker authentication, joint bimodal authentication, model adaptation, and scalability. The project-demonstration system will include two main scenarios: 1) embedded biometry, where the system is running entirely on a mobile phone, and 2) remote biometry. The latter is the scenario approached in our study, for which a general solution is presented in this paper; alongside, three already developed demonstration systems.
III. MOBILE PHONES AND WEB-BASED BIOMETRIC CAPTURE: STATE OF THE ART As has been seen, our goal is to perform a biometric recognition during a web session, when a mobile phone is used. The biometric-user authentication can be used to substitute the password or can be used in addition to it. This has already been done for PC, laptop, and similar platforms by us2 (for more details, see Section IV), and other authors (e.g., see [29]) and companies (e.g,Dynamic Biometric Systems or Communication Intelligence Corporation (CIC), both related with signature recognition). Analyzing the technologies used for embedded programs in a web page in order to capture and send the biometrics, we have found the following:3 Applet Java, ActiveX controls, Flash technology, JavaScript, and Microsoft Silver light. The last two have only been found to acquire signature and capturing the mouse events. Our first approximation to the problem was to perform the biometric acquisition by means of a mobile phone in the same way as with the PC, i.e., using the aforementioned technologies to embed applications in a web page. Knowing the computational restrictions of the mobile devices, a study of the state of the technology in the main mobilephone platforms and browsers was necessary. IV. SYSTEM PROPOSAL First, we will describe the general architecture of our system, and then, we will show three systems implemented from it. The first one is signaturebased, the second is speech-based, and the third is facebased. Architecture According to a general biometric system consists of the following four modules. 1) Sensor module (or biometric reader): This is the interface between man and machine; therefore, the system performance depends strongly on it. 2) Quality assessment and feature-extraction module: The data provided by the sensor must first be validated from the point of view of quality, refusing it when the quality is too poor, and, second, extracting the features that represent, in the best possible way, the identity of the individual.
3) Matcher and decision-making module: The extracted features are compared with the stored templates to generate a score to determine whether to grant or deny access to the system. 4) Database system: This is the repository of the biometric information. During the enrollment phase, the templates are stored along with some additional personal information, such as name, address, etc. The modules of the proposed architecture are allocated mainly on the server, looking for greater system security, upgrade control, and avoiding computation limitations. However, depending on the needs, some parts can be moved to the client; specially, the modules for acquisition, validation, and data preprocessing. The modular architecture proposed allows the vendors to build their biometric solutions based on our architecture so that server and client software can be decoupled, and thus, encouraging the development of applications by different teams companies, operating under well-established standard protocols already known by the community of developers. The main modules of the proposed architecture are the following. 1) Client Tier: On the client side, the biometric acquisition software is deployed. Since, as noted above, there are no standard software solutions for web browsers to capture biometric data, this part should be distributed ad hoc for each type of platform.For this reason, our architecture proposes to leave only the datacapturing module on the client side, with the rest of the modules at the server side. This means that the applications developed need no special memory or processing requirements, since the main computer load falls on the execution of a web navigator and standard mobile devices (e.g., touch screen, microphone, camera, etc.) are used to capture the biometrics; then, our proposal can be run in, practically, any current mid-range to high-range mobile devices. The application at this side controls and communicates with the following three main general components, which will be explained in greater detail in the next section.
Tomcat application server. The server modules for capture and preprocessing have been developed in the hypertext-processor (PHP) programming language, and the verification engine was written in Java. This latter verification at system side is described and was one of the winning systems of the international competition signature . it shows real screen captures of a use example of our signaturerecognition experimental web using PC and a mobile device. As can be seen, the differences with regard to the user point of view are minimal. 2) Voice-Based System: This application allows services local data of the mobile device to be accessed after authentication by speech, although the biometric recognition is performed remotely. Client side: A system has been developed that enables multi device authentication from both a PC and mobile device. 1) For capturing the data with a PC browser, a Java Applet that captures voice and sends it to the server has been developed. 2) For speech acquisition in the mobile device, an application in the .NET framework that operates almost the same as the signature system has been developed, but with three differences: 1) The URLs needed to manage the application from the remote-resource access are within the application code, which means greater security but less versatility. 2) The signature is sent by POST method . 3) The uploaded-component functionalities have been modified so that it manages the local access, as is explained now. The way to access the remote result of verification is through messages introduced in the PHP page code responsible for the
verification of the voice. In this way, the uploaded component also manages any errors that may occur while processing and testing the speech sample. The application can be downloaded from the previous item web pages. Server side: An Apache web server has been used. Other server modules have been developed in PHP programming language, for the capture engine, and C and UNIX Shell for the preprocessing and verification engine. The real screen captures of the mobile application. 3) Face-Based System: This application allows services local data of the mobile device to be accessed after authentication by face, although the biometric recognition is performed remotely. The Screen capture (using MyMobiler) of the mobile experimental speaker recognition application. Here, the PDA is used. (a) HOMEPAGE. (b) Voice Access web page. (c) Speech acquisitionStep 1: Open the recorder. (d) Speech acquisitionStep 2: Recording (this can be played). (e) Speech acquisition Step 3: Analyze to send the recording (this can take several seconds). (f) Data sending to the web server. (g) Authentication result (local message).
V. CONCLUSION In this paper, the problem of using biometric user authentication during a standard web session when a mobile phone is used has been successfully approached. We have focused on the technological problem of capturing the biometric with the mobile phone, sending it to the web server, and, after user authentication, allowing or rejecting the users continuation with the web session in the same way this had been performed using password authentication. First, we have shown that there are several related works, projects, and commercial applications; however, as far as the authors knowledge, none of them have approached the biometric recognition in a mobile environment via the Web. Second, we have proved that the standard solutions to approach the problem in PC platforms, using Applets Java, ActiveX controls, JavaScript, or Flash technology, do not work under mobile platforms. Therefore, a new alternative is needed. A solution has been shown that basically consists of, instead of embedding an application in the web page, embedding a web browser in a mobile-phone application, using a modular architecture to develop the biometric web application. Three different implementations of this simple, but very effective, idea have been shown, with one allowing the password to be substituted by the signature in a web access to a restricted service, and the others allowing a restricted access to local data and applications in the mobile phones.
Screen capture of the mobile experimental facerecognition application for (a)(e) Windows mobile and (f)(j) Android mobile device platforms. (a)and (f) Face Access web page. (b) and (g) Face acquisition. (c) and (h)Device image-capture-application execution. (d) and (i) Image checking, which can be sent to the web server or taken once again. (e) and (j) Authentication result (local message).

Iconic 12

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Iconic 12

Transféré par

Droits d'auteur :

Formats disponibles

DRDO Sponsored International Conference On

Intelligence Computing (ICONIC'12)

Editors Mr. K. Karnavel Mrs. S. Roselin Mary Prof. J.R. Balakrishnan

Department of Computer Science and Engineering, Anand Institute of Higher Technology

Email: iconic.aiht@gmail.com Web Site: http://www.iconic12.in http://www.aiht.ac.in

Published by: ISBN :

DRDO Sponsored International Conference on Intelligence Computing (ICONIC12)

DRDO Sponsored International Conference on Intelligence Computing (ICONIC12)

DRDO Sponsored International Conference on Intelligence Computing (ICONIC12)

ANAND INSTITUTE OF HIGHER TECHNOLOGY

Kalasalingam Nagar, old Mahabalipuram Road, Kazhipattur, Chennai-603103 Phone: 27471310/27471320

Kalvivallal Shri. T. Kalasalingam Founder and Chairman

DRDO Sponsored International Conference on Intelligence Computing (ICONIC12)

ANAND INSTITUTE OF HIGHER TECHNOLOGY

Kalasalingam Nagar, old Mahabalipuram Road, Kazhipattur, Chennai-603103 Phone: 27471310/27471320

Ilayavallal Shri. K. Sridharan Secretary

DRDO Sponsored International Conference on Intelligence Computing (ICONIC12)

ANAND INSTITUTE OF HIGHER TECHNOLOGY

Kalasalingam Nagar, old Mahabalipuram Road, Kazhipattur, Chennai-603103 Phone: 27471310/27471320

Dr. Arivalagi Sridharan Director (A&A)

DRDO Sponsored International Conference on Intelligence Computing (ICONIC12)

ANAND INSTITUTE OF HIGHER TECHNOLOGY

Kalasalingam Nagar, old Mahabalipuram Road, Kazhipattur, Chennai-603103 Phone: 27471310/27471320

Dr. T.A. Raghavendiran PRINCIPAL

DRDO Sponsored International Conference on Intelligence Computing (ICONIC12)

ANAND INSTITUTE OF HIGHER TECHNOLOGY

Kalasalingam Nagar, old Mahabalipuram Road, Kazhipattur, Chennai-603103 Phone: 27471310/27471320

Prof. J.R.BALAKRISHNAN Director (CSE/IT/MCA)

DRDO Sponsored International Conference on Intelligence Computing (ICONIC12)

ANAND INSTITUTE OF HIGHER TECHNOLOGY

Kalasalingam Nagar, old Mahabalipuram Road, Kazhipattur, Chennai-603103 Phone: 27471310/27471320

DRDO Sponsored International Conference on Intelligence Computing (ICONIC12)

ANAND INSTITUTE OF HIGHER TECHNOLOGY

Kalasalingam Nagar, old Mahabalipuram Road, Kazhipattur, Chennai-603103 Phone: 27471310/27471320

DRDO Sponsored International Conference on Intelligence Computing (ICONIC12)

DRDO Sponsored International Conference on Intelligence Computing (ICONIC12)

DRDO Sponsored International Conference on Intelligence Computing (ICONIC12)

DRDO Sponsored International Conference on Intelligence Computing (ICONIC12)

LIST OF COMMITTEE MEMBERS

Organizing Chairman : Dr.T.A .Raghavendiran.Principal Co-Organizer Convenor : Prof.J.R.Balakrishnan,Director(CSE/IT/MCA) : Mrs.S.RoselineMary,HOD/CSE

Organizing Secretary : Mr.K.Karnavel,Lecturer/CSE

DRDO Sponsored International Conference on Intelligence Computing (ICONIC12)

DRDO Sponsored International Conference on Intelligence Computing (ICONIC12)

DRDO Sponsored International Conference on Intelligence Computing (ICONIC12)

DRDO Sponsored International Conference on Intelligence Computing (ICONIC12)

An Alternative Codebook Design Of Multi-User MIMO System For 3GPP-LTE Release 12

Application Of Lean Principles In Health Care Sector

An Effective Text Clustering And Retrieval Using SAP

Link Based Routing In Suspicious MANETs

Automatic Segmentation And Classification Of Lung CT Images

SVM Based Intelligent Prediction with Heart Disease Datasets

Detecting Maliciouspacket losses

Motion Detection Using Recognition Algorithms

Improving Security And Efficiency In Attribute Based Data Sharing

M.SenthilKumar Dr.R.Baskaran Anna University, Chennai

A RBF Based Learning Approach To Predict The Status Of SCM Repository

M.Kaleeswari S.Selvakumari Francis Xavier Engineering College, Tirunelveli

Survey of TCP adoptions In Mobile Ad-Hoc networks

Christina .J Kanchana .A Revathy.V Imthyaz Sheriff Easwari Engineering College

Computer network database attacks security from threats and hackers

R. Shankar A. Sankaran Indira Institute of Engineering and Technology, Pandur.

Seamless Network Connectivity In Vehicular Ad Hoc Network

Building An Offline Barcode Scanner For Android Device

Dynamic Voltage Scaling For Multi-Processor Using Gals