Fuzzy Techniques For Video Surveillance

Fuzzy Techniques For Video Surveillance Page 1
CERTIFICATE

This is to certify that the major project entitled Fuzzy Techniques for Video Surveillance
being submitted by

PRATEEK GOEL 2K9/EC/678
ROHIT JAIN 2K9/EC/690
SACHIN SRIVASTAVA 2K9/EC/692
SAURAV DAS 2K9/EC/701
SHAIV KASHYAP 2K9/EC/703

in partial fulfilment of the degree of Bachelor of Engineering in Electronics & Communication
Engineering of Delhi University, has been carried out under my guidance and supervision.
The matter submitted in this report has not been submitted to any other institute or university for
award of any degree, to the best of my knowledge.

Prof. Rajesh Rohilla
Electronics and Communication Engineering
Department,
Delhi Technological University,
(Formerly Delhi College of Engineering)
Bawana Road, Delhi-110042

Forwarded by

Dr.Rajiv Kapoor
HOD,Electronics and Communication Engineering Department
Delhi Technological University

ACKNOWLEDGEMENT

We feel honoured in expressing our profound sense of gratitude and indebtness to Prof. Rajesh
Rohilla, Professor, Electronics & Communication department, Delhi College of Engineering for
giving us the opportunity to work on such a practical problem, under his expert guidance. He
guided us throughout the project and he use to encourage us constantly.

PRATEEK GOEL 2K9/EC/678

ROHIT JAIN 2K9/EC/690

SACHIN SRIVASTAVA 2K9/EC/692

SAURAV DAS 2K9/EC/701

SHAIV KASHYAP 2K9/EC/703


CONTENTS

Certificate 1
Acknowledgment 2
Chapter 1 What motivates us? 5
Chapter 2 What is this area about? 6
2.1 Application Areas
Human Computer Interactions
Intelligent Environment
Multimedia
Intelligent robots
6
6
8
8
8
2.2 Fundamental research issues
Image Processing and Computer
Vision
Machine learning and pattern
recognition
9
9

9

Chapter 3 What is Computer Vision? 11
3.1 Image formation level 17
3.2 Low level image Processing 17
3.3 Low level vision 18
3.4 Middle level image 19
3.5 High Level Image 20
Chapter 4 OpenCV 21
4.1 Who uses OpenCV ? 22
4.2 Origin of OpenCV 23
4.3 Who owns OpenCV ? 25
4.4 OpenCV structure and Contents 25
Chapter 5 Typical task of computer vision 28
5.1 Recognition 28
5.2 Motion Analysis 29
5.3 Scene Reconstruction 30
5.4 Image Restoration 30

Chapter 6 Computer Vision System Methods 31
6.1 Image Acquisition 31
6.2 Feature Extraction 32
6.3 Types of Features 33
6.4 Detection and Segmentation 36
6.5 High level processing 37
6.6 Decision making 37
Chapter 7 Tracking 38
7.1 Object detection and segmentation 38
7.2 Tracking of detected objects/points 39
7.3 Background subtraction 40
7.4 Contour Extraction 40
7.5 Template matching 43
Chapter 8 How we implemented 45
8.1 SURF(Speeded up Robust Features) 45
8.2 Fuzzy Logic 51
8.3 Block Diagram of Algorithm 57
8.4 Output 59
Chapter 9 Advantages 63
Chapter 10 Limitations 64
Chapter 11 Future Growth 67
Chapter 12 Application 68
Chapter 12 Reference 70


1. WhatMotivatesUs?

Aninteresting question wealwaysaskiswhat the next generation
ofcomputers isgoingtobelike. Toanswerthis question,
letsrecallourfirsttouchofcomputer. Atleast, myexperiencewasthat
Iwavedmyhands andsaidhowareyoutoamachinery.
Obviously,noansweratall.
Itwasadream that computers wouldbeabletoseeandthink,
whichhasbeendrivingustoexplorevariousresearchissuestomakethis dream
cometrue. Although
computersbecomefasterandfaster,theyarestillquitedull,sincetheycanneithers
eenorevenperformsimplereasoning.
Obviously,wearenotsatisfiedtojustuseourcomputers asa calculator, a word
processer,aCDplayer,oragamestation; instead, weexpectcomputers
todomore intelligent thingslikeour human beings. For example,

Cancomputers identifyme bylookingat myfaceoreven my gait?

Cancomputers knowwhereIamlookingatandwhatIam doing?

Cancomputers tellwhatisacarandwhatisnota car?

Cancomputers learnsomething by themselves?

Cancomputers summarizeavideofor me?


2. What Is This Area About?

Obviously,withaninterdisciplinary nature, thisareainvolvesfundamental
researchinimage processing, computer vision/graphics, machine learning,
pattern recognition, biomechanics
andevenpsychology.Figure1showsabigpictureofthisarea.
Ontopofitareseveralmajorapplication areassuchas human-computer
interaction, robotics, virtual environments, and multimedia.
Thecommonfoundation forsuch applications includecomputer
vision,imageprocessingandspeechprocessing. Instead oftaking someadhoc
approaches toaudioandvisualprocessingwhenthearea wasinitsinfantile stage,
wearecurrently pursuing some intelligent ways bymachine learning and
pattern recognition, trying toachieveakindof artificial intelligence.

2.1 Application Areas

We can imagine what avisually-capable and intelligent computer can
do! Weexpect a revolution in nextgeneration ofcomputer: wedonot
usemiceand keyboards anymore.Computers could understand our
actionsandourlanguages,theycouldthink andfeedback toussomekindofsmart
resultsinresponseto our commands, andtheycouldeven
performsomemissionson behalfofour human beings.Leastbutnotlast,
weexpectarapidprogressinthenearfuture insuchareasas intelligent human-
computer interaction, robotics, virtualenvironments, intelligent
environments,and multimedia.

2.1.1 Human-Computer Interaction

Theresearchof human-computer interaction

isnolongerthedesignofdevicesandpsycho-
logicalexperimentsofwindowslayouts, but evolutestoanewstage: intelligent
interaction. Oneaspectsis that computers should
beabletoacceptaudioandvisualsensoryinputs, and thenmakesomekindof
analysis and interpretation, andthenprovideintuitivefeedbacks by
synthesizingspeech,videooractions. Fundamentally,
besidesspeechrecognition,computersshould beabletorecognize, interpret and
understand human actionsand behaviorsfromvisual inputs.


2.1.2 Intelligent Environments

Intelligentenvironments, orsmart environments,
refertosomephysicalspacesthat could automatically orintelligently react
according to human activities. For example, whenapersonenters,
thesystemcouldtellapeoplecomesinandevenidentifywhos/he is,andthen turn
onthelights. WhenthepeoplesitsonasofaandpointstoaTV,theTVwillbeturned
on.Whens/he saysIwantsomenews,theTVwillbeswitchedtoachannelthat
isbroadcastingnewsatthat moment.

2.1.3 Multimedia

Multimedia isa vagueterm. Different peoplehavedifferent emphasis.
Weare particularly interested in theanalysisofthe contentof multimedia.
Aninteresting question weaskiswhatisinsidethispicture or
whatthisvideomeans,whichinvolvesaquite challengingtask ofimage/video
understanding. Many appealing applications have beenproposed,but yetto
beaccomplished. Whengivenjust aphotoofSophie Marceau, without
knowinghername,computerscouldsearchthe Internet and gettones
ofherphotos and movies. When yougettied of watching
alongmovie,computers could automatically summarize the moviein maybe
five minutes.

2.1.4 Intelligent Robots

Robotshave
beengivingquitegoodmechanicalability,buttheyarestillmachinery
becausetheyare neither abletoseenorabletothink. Hondahasbuilt a humanoid

robot,ASIMO,whichcan walklikea human
being.However,heisblind,dumpanddull. WeexpecttoseethatASIMOmoves by
itself.

2.2 Fundamental Research Issues

Thefundamental researchinimageprocessing,computer
vision,machinelearningandpat-tern recognition is important part
ofthefoundation oftheseapplication topics.

2.2.1 Image Processing and Computer Vision

Imageprocessingis aquite
boardresearcharea,notjustfiltering,compression,andenhance-ment.
Besides, weareeveninterested inthe question, what isinimages?, i.e.,
content analysis ofvisual inputs, whichis part ofthe main task ofcomputer
vision. The studyofcomputer visioncouldmake possiblesuchtasks as3D
reconstruction ofscenes,motion capturing,
andobjectrecognition,whicharecrucialforeven higher-level
intelligencesuchasimageandvideo understanding, andmotion understanding.

2.2.2 Machine Learning and Pattern Recognition

Vision perceptionitselfisan intelligent
process,notjustanimagingprocess.Throughvision, human beings are ableto
perceivethelighting,color, texture, shapeandmotionoftheoutside world.
Theintelligenceliesin theinferenceofsuchhigh-
levelconceptsbasedonimaging.It isquite easyfor human beings,but itisstill

veryunclear howcomputers canachieve that levelofintelligence.
Recognitionisoneofthemost fundamental
problemsformachine,i.e.,recognizingapre-stored pattern innewsituations
bycomparing inputs witha setoftemplates ormodels.
However,theproblemishowtoconstruct thesetemplates ormodels. For
example, whatwill bethe appropriate templates
torecognizefacesevenunderdifferent viewdirections ordifferentlightings? The
most challengingaspect forvisualrecognitionliesinthefactthat there
aretoomany aspects that
affectsimaging,anditisimpossibletomodeleveryaspectssuchaslightingandmoti
on. So, people ask, cancomputers learn themodelfromexamples?suchthat
modelscould belearnedimplicitly,instead of constructed explicitly.


3. What Is Computer Vision?

Computer vision is a field that includes methods for
acquiring, processing, analyzing, and understanding images and, in general,
high-dimensional data from the real world in order to produce numerical or
symbolic information, e.g., in the forms of decisions. A theme in the
development of this field has been to duplicate the abilities of human vision
by electronically perceiving and understanding an image. This image
understanding can be seen as the disentangling of symbolic information
from image data using models constructed with the aid of geometry,
physics, statistics, and learning theory. Computer vision has also been
described as the enterprise of automating and integrating a wide range of
processes and representations for vision perception.

Computer vision is the transformation of data from a still or video
camera into either adecision or a new representation. All such
transformations are done for achieving someparticular goal. The input data
may include some contextual information such as thecamera is mounted in
a car or laser range finder indicates an object is 1 meter away.The
decision might be there is a person in this scene or there are 14 tumour
cells on
this slide. A new representation might mean turning a colour image
into a greyscale imageor removing camera motion from an image sequence.


Because we are such visual creatures, it is easy to be fooled into
thinking that computervision tasks are easy. How hard can it be to find,
say, a car when you are staringat it in an image? Your initial intuitions can
be quite misleading. The human brain dividesthe vision signal into many
channels that stream different kinds of informationinto your brain. Your
brain has an attention system that identifies, in a task-dependent way,
important parts of an image to examine while suppressing examination of
otherareas. There is massive feedback in the visual stream that is, as yet,
little understood.There are widespread associative inputs from muscle
control sensors and all of the othersenses that allow the brain to draw on
cross-associations made from years of living inthe world. The feedback
loops in the brain go back to all stages of processing includingthe hardware
sensors themselves (the eyes), which mechanically control lighting via the
iris and tune the reception on the surface of the retina.
In a machine vision system, however, a computer receives a grid of
numbers from thecamera or from disk, and thats it. For the most part,
theres no built-in pattern recognition,no automatic control of focus and
aperture, no cross-associations with years ofExperience. For the most part,
vision systems are still fairly nave. Figure 1-1 shows apicture of an
automobile. In that picture we see a side mirror on the drivers side of the
car. What the computer sees is just a grid of numbers. Any given
number within thatgrid has a rather large noise component and so by itself
gives us little information, butthis grid of numbers is all the computer
sees. Our task then becomes to turn this noisygrid of numbers into the
perception: side mirror. Figure 1-2 gives some more insightinto why
computer vision is so hard.

Fig1-2

In fact, the problem, as we have posed it thus far, is worse than hard;
it is formally impossibleto solve. Given a two-dimensional (2D) view of a 3D
world, there is no uniqueway to reconstruct the 3D signal. Formally, such
an ill-posed problem has no unique ordefinitive solution. The same 2D
image could represent any of an infinite combinationof 3D scenes, even if
the data were perfect. However, as already mentioned, the data is
corrupted by noise and distortions. Such corruption stems from variations in
the world(weather, lighting, reflections, movements), imperfections in the
lens and mechanicalsetup, finite integration time on the sensor (motion
blur), electrical noise in the sensoror other electronics, and compression
artefacts after image capture. Given these dauntingchallenges how can we
make any progress?

In the design of a practical system, additional contextual knowledge
can oft en be usedto work around the limitations imposed on us by visual
sensors. Consider the exampleof a mobile robot that must find and pick up
staplers in a building. The robot might usethe facts that a desk is an object
found inside offices and that staplers are mostly foundon desks. This gives
an implicit size reference; staplers must be able to fit on desks. Italso helps
to eliminate falsely recognizing staplers in impossible places (e.g., on
theceiling or a window). The robot can safely ignore a 200-foot advertising
blimp shapedlike a stapler because the blimp lacks the prerequisite wood-
grained background of adesk. In contrast, with tasks such as image
retrieval, all stapler images in a database may be of real staplers and so
large sizes and other unusual configurations may havebeen implicitly
precluded by the assumptions of those who took the photographs.
That is, the photographer probably took pictures only of real, normal-
sized staplers.People also tend to center objects when taking pictures and
tend to put them in characteristicorientations. Thus, there is oft en quite a
bit of unintentional implicit informationwithin photos taken by
people.Contextual information can also be modelled explicitly with machine
learning techniques.

Hidden variables such as size, orientation to gravity, and so on can
then becorrelated with their values in a labelled training set. Alternatively,
one may attemptto measure hidden bias variables by using additional
sensors. The use of a laser range finder to measure depth allows us to
accurately measure the size of an object.The next problem facing computer
vision is noise. We typically deal with noise by usingstatistical methods. For
example, it may be impossible to detect an edge in an imagemerely by
comparing a point to its immediate neighbours. But if we look at the
statisticsover a local region, edge detection becomes much easier. A real
edge should appear as astring of such immediate neighbour responses over
a local region, each of whose orientationis consistent with its neighbours. It
is also possible to compensate for noise by takingstatistics over time. Still

other techniques account for noise or distortions by building explicitmodels
learned directly from the available data. For example, because lens
distortionsare well understood, one need only learn the parameters for a
simple polynomialmodel in order to describeand thus correct almost
completelysuch distortions.The actions or decisions that computer vision
attempts to make based on camera dataare performed in the context of a
specific purpose or task. We may want to remove noiseor damage from an
image so that our security system will issue an alert if someone triesto
climb a fence or because we need a monitoring system that counts how
many peoplecross through an area in an amusement park. Vision soft ware
for robots that wanderthrough office buildings will employ different
strategies than vision soft ware for stationarysecurity cameras because the
two systems have significantly different contextsand objectives. As a
general rule: the more constrained a computer vision context is, themore
we can rely on those constraints to simplify the problem and the more
reliable ourfinal solution will be.
OpenCV is aimed at providing the basic tools needed to solve
computer vision problems.In some cases, high-level functionalities in the
library will be sufficient to solvethe more complex problems in computer
vision. Even when this is not the case, the basiccomponents in the library
are complete enough to enable creation of a complete solutionof your own
to almost any computer vision problem. In the latter case, there areseveral
tried-and-true methods of using the library; all of them start with solving
theproblem using as many available library components as possible.
Typically, after youve developed this first-draft solution, you can see where
the solution has weaknesses andthen fix those weaknesses using your own
code and cleverness (better known as solvethe problem you actually have,
not the one you imagine). You can then use your draft solution as a
benchmark to assess the improvements you have made. From that
point,whatever weaknesses remain can be tackled by exploiting the context
of the larger systemin which your problem solution is embedded.

Accordingto our understanding, computer
vision,basically,istoinferdifferent factorssuchas camera model,lighting,
colour, texture, shapeandmotion thataffect imagesandvideos,
fromvisualinputs. A rough structure ofmachinevisioncould be illustrated
byFigure2.In a word,computer vision
isaninverseprocessingoftheforwardprocessofimageformationandgraphics. In
thissense,asmany peopleagree,visionisa muchmore challengingproblemthan
computergraphics, because it isfullof uncertainties.


3.1 Image Formation

Imageformationstudiestheforwardprocessofproducingimagesandvideos
.Itisanimportant research topicfor bothvisionandgraphics.
Toproducearealimage,thenature ofthevisualsensors,i.e., cameras, should
bestudied. Interms ofgeometrical aspects ofcamera, peoplehave
beenlookingintopinhole cameras, cameraswithlenses
andevenomnidirectional cameras.Interms ofphysicalaspects,factorssuchas
focal lengthsanddynamicrangesofCCDandCMOScamerashave been
investigated.
Besidestheimagingdevice,itisalso importanttostudy
thefactorsfromobjectsandscenes themselves, suchaslighting, colour,
texture, motion andshape,whichlargelyaffecttheappearance of images and
video.

3.2 Low-level Image Processing

Low-levelimageprocessingisnot vision,but thepre-
processingstepsforvision. Thebasictask is toextract fundamental
imageprimitives forfurther processing,including edge detection, corner
detection, filtering,andmorphology, etc.


3.3 Low-level Vision

Basedonlow-levelimageprocessing,low-levelvision taskscould
bepreformed,suchasimagematching, optical flow computation
andmotionanalysis. Imagematchingbasicallyistofind correspondences
between two or moreimages.Theseimagescould
bethesamescenetakenfromdifferent view points,oramovingscene taken
byafixedcamera,or both. Constructing imagecorrespondences
isafundamentally importantproblem invisionfor both
geometryrecoveryandmotionrecovery.Without exaggeration,
imagematchingispart of the basefor vision.

Opticalflowisakindofimageobservationofmotion,butitisnotthetruemotion
. Sinceit onlymeasuresthe optical changesinimages, anapertureproblem
isunavoidable. Butbasedonopticalflows, camera motionorobjectmotioncould
be estimated.

3.4 Middle-levelVision

There are twomajor aspects in middle-levelvision: (1) inferring the
geometry and (2)inferring the motion. These two aspects arenot
independent but highlyrelated. Asimple question iscan we estimate
geometry basedonjust oneimage?. The answerisobvious. Weneedat least
twoimages. They could betaken from twocameras
orcomefromthemotionofthe scene.
Somefundamentalpartsofgeometricvisioninclude
multitiergeometry,stereoandstructure from motion
(SfM),whichfulfillthestepoffrom2Dto3D byinferring3Dsceneinformation
from2D images.Basedonthat, geometricmodellingistoconstruct 3Dmodelsfor
objectsandscenes,suchthat 3D reconstruction andimage-based rendering
could bemade possible.
Another task ofmiddle-levelvisionistoanswerthequestion
howtheobjectmoves.Firstly, we should knowwhichareasintheimages
belongtotheobject,whichisthetask ofimage segmentation.
Imagesegmentation has beena challengingfundamental problemincomputer
visionfordecades. Segmentationcould bebased onspatial similarities
andcontinuities. However, uncertainty cannot be overcomefor static
image.Whenconsideringmotioncontinuities, wehopethe
uncertaintyofsegmentation could be alleviated.Ontop ofthat isvisualtracking
andvisualmotion capturing, whichestimate 2Dand3Dmotions, including
deformablemotionsand articulated motions.


3.5 High-level Vision

High-levelvisionistoinferthesemantics,
forexample,objectrecognitionandsceneunder- standing. A
challengingquestioninmanydecadesisthat howtoachieve invariant
recognition,i.e.,recognize3D object fromdifferent viewdirections. Therehave
been two approachesforrecognition: model-based recognition and learning-
based recognition. Itisnoticedthat there wasaspiral development ofthese two
approaches in history.
Evenhigherlevelvisionisimage understanding andvideo understanding.
Weareinterested in
answeringquestionslikeIsthereacarintheimage?orIsthisvideoadrama
oranaction?, orIs the personinthevideojumping?
Basedontheanswersofthesequestions, weshould beabletofulfildifferent tasks
in intelligent human-computer interaction, intelligent robots,smart
environment and content-based multimedia.


4 OpenCV

OpenCV [OpenCV] is an open source (see http://opensource.org)
computer vision libraryavailable from
http://SourceForge.net/projects/opencvlibrary. The e library is written in
Cand C++ and runs under Linux, Windows and Mac OS X. There is active
developmenton interfaces for Python, Ruby, Matlab, and other languages.
OpenCV was designed for computational efficiency and with a strong
focus on real-timeapplications. OpenCV is written in optimized C and can
take advantage of multicourseprocessors. If you desire further automatic
optimization on Intel architectures[Intel], you can buy Intels Integrated
Performance Primitives (IPP) libraries [IPP], whichconsist of low-level
optimized routines in many different algorithmic areas.
OpenCVautomatically uses the appropriate IPP library at runtime if that
library is installed.
One of OpenCVs goals is to provide a simple-to-use computer vision
infrastructurethat helps people build fairly sophisticated vision applications
quickly. The OpenCVlibrary contains over 500 functions that span many
areas in vision, including factoryproduct inspection, medical imaging,
security, user interface, camera calibration, stereovision, and robotics.
Because computer vision and machine learning often go hand-in
hand,OpenCV also contains a full, general-purpose Machine Learning Library
(MLL).
This sub library is focused on statistical pattern recognition and
clustering. The MLL ishighly useful for the vision tasks that are at the core
of OpenCVs mission, but it is generalenough to be used for any machine
learning problem.


4.1 Who Uses OpenCV?
Most computer scientists and practical programmers are aware of
somefacet of the rolethat computer vision plays. But few people are aware
of all the ways in which computervision is used. For example, most people
are somewhat aware of its use in surveillance,and many also know that it is
increasingly being used for images and video on the Web.
A few have seen some use of computer vision in game interfaces. Yet
few people realizethat most aerial and street-map images (such as in
Googles Street View) make heavy use of camera calibration and image
stitching techniques. Some are aware of niche applicationsin safety
monitoring, unmanned flying vehicles, or biomedical analysis. Butfew are
aware how pervasive machine vision has become in manufacturing:
virtuallyeverything that is mass-produced has been automatically inspected
at some point usingcomputer vision.
The open source license for OpenCV has been structured such that
you can build acommercial product using all or part of OpenCV. You are
under no obligation to open sourceyour product or to return improvements
to the public domain, though we hopeyou will. In part because of these
liberal licensing terms, there is a large user communitythat includes people
from major companies (IBM, Microsoft, Intel, SONY, Siemens,and Google,
to name only a few) and research centres (such as Stanford, MIT,
CMU,Cambridge, and INRIA). There is a Yahoo groups forum where users
can post questionsand discussion at
http://groups.yahoo.com/group/OpenCV; it has about 20,000 members.
OpenCV is popular around the world, with large user communities in
China, Japan,Russia, Europe, and Israel.Since its alpha release in January
1999, OpenCV has been used in many applications,products, and research

efforts. These applications include stitching images together insatellite and
web maps, image scan alignment, medical image noise reduction,
objectanalysis, security and intrusion detection systems, automatic
monitoring and safety systems,manufacturing inspection systems, camera
calibration, military applications, andunmanned aerial, ground, and
underwater vehicles. It has even been used in sound andmusic recognition,
where vision recognition techniques are applied tosound
spectrogramimages. OpenCV was a key part of the vision system in the
robot from Stanford,Stanley, which won the $2M DARPA Grand Challenge
desert robot race.

4.2 The Origin of OpenCV

OpenCV grew out of an Intel Research initiative to advance CPU-
intensive applications.Toward this end, Intel launched many projects
including real-time ray tracing and 3Ddisplay walls. One of the authors
working for Intel at that time was visiting universitiesand noticed that some
top university groups, such as the MIT Media Lab, had well developedand
internally open computer vision infrastructurescode that was passedfrom
student to student and that gave each new student a valuable head start in
developinghis or her own vision application. Instead of reinventing the basic
functions fromscratch, a new student could begin by building on top of what
came before.
Thus, OpenCV was conceived as a way to make computer vision
infrastructure universallyavailable. With the aid of Intels Performance
Library Team,* OpenCV startedwith a core of implemented code and
algorithmic specifications being sent to membersof Intels Russian library
team. This is the where of OpenCV: it started in Intels researchlab with
collaboration from the Soft ware Performance Libraries group togetherwith
implementation and optimization expertise in Russia.

Chief among the Russian team members was Vadis Pisarevsky, who
managed, coded,and optimized much of OpenCV and who is still at the
centre of much of the OpenCV effort. Along with him, Victor Eruhimov
helped develop the early infrastructure, andValery Kuriakin managed the
Russian lab and greatly supported the effort. There wereseveral goals for
OpenCV at the outset:
Advance vision research by providing not only open but also optimized
code for basic vision infrastructure. No more reinventing the wheel.
Disseminate vision knowledge by providing a common
infrastructure those developers
Could build on, so that code would be more readily readable and
transferable.
Advance vision-based commercial applications by making portable,
performanceoptimizedcode available for freewith a license that did
not require commercialapplications to be open or free themselves.
Those goals constitute the why of OpenCV. Enabling computer vision
applicationswould increase the need for fast processors. Driving upgrades to
faster processors wouldgenerate more income for Intel than selling some
extra soft ware. Perhaps that is why thisopen and free code arose from a
hardware vendor rather than a soft ware company. Insome sense, there is
more room to be innovative at soft ware within a hardware company.
In any open source effort, its important to reach a critical mass at
which the projectbecomes self-sustaining. There have now been
approximately two million downloads of OpenCV, and this number is
growing by an average of 26,000 downloads a month.
The user group now approaches 20,000 members. OpenCV receives
many user contributions,and central development has largely moved
outside of Intel.* OpenCVs pasttimeline is shown in Figure 1-3. Along the
way, OpenCV was affected by the dot-comboom and bust and also by
numerous changes of management and direction. Duringthese fluctuations,
there were times when OpenCV had no one at Intel working on it atoll.
However, with the advent of multicourse processors and the many new

applicationsof computer vision, OpenCVs value began to rise. Today,
OpenCV is an active areaof development at several institutions, so expect to
see many updates in multicameracalibration, depth perception, methods for
mixing vision with laser range finders, andbetter pattern recognition as well
as a lot of support for robotic vision needs.

4.3 Who Owns OpenCV?

Although Intel started OpenCV, the library is and always was intended
to promotecommercial and research use. It is therefore open and free, and
the code itself may beused or embedded (in whole or in part) in other
applications, whether commercial orresearch. It does not force your
application code to be open or free. It does not requirethat you return
improvements back to the librarybut we hope that you will.

4.4 OpenCV Structure and Content
OpenCV is broadly structured into five main components, four of
which are shown. The CV component contains the basic image processing
and higher-levelcomputer vision algorithms; ML is the machine learning
library, which includes manystatistical classify and clustering tools. High-up

contains I/O routines and functionsfor storing and loading video and
images, and Score contains the basic data structuresand content.
Figure does not include CvAux, which contains both defunct areas
(embedded HMMface recognition) and experimental algorithms
(background/foreground segmentation).CvAux is not particularly well
documented in the Wiki and is not documented at all inthe .../opencv/docs
subdirectory. CvAux covers:Eigen objects, a computationally efficient
recognition technique that is, in essence, atemplate matching procedure

1D and 2D hidden Markov models, a statistical recognition technique
solved by
Dynamic programming
Embedded HMMs (the observations of a parent HMM are themselves
HMMs)
Gesture recognition from stereo vision support
Extensions to Delaunay triangulation, sequences, and so forth
Stereo vision
Shape matching with region contours
Texture descriptors
Eye and mouth tracking
3D tracking
Finding skeletons (central lines) of objects in a scene
Warping intermediate views between two camera views
Background-foreground segmentation
Video surveillance (see Wiki FAQ for more documentation)
Camera calibration C++ classes (the C functions and engine are in
CV)
Some of these features may migrate to CV in the future; others
probably never will.


5 Typical task of computer vision
Some examples of typical computer vision tasks are presented below.

5.1 Recognition
The classical problem in computer vision, image processing, and
machine vision is that of determining whether or not the image data
contains some specific object, feature, or activity. This task can normally be
solved robustly and without effort by a human, but is still not satisfactorily
solved in computer vision for the general case arbitrary objects in
arbitrary situations. The existing methods for dealing with this problem can
at best solve it only for specific objects, such as simple geometric objects
(e.g., polyhedral), human faces, printed or hand-written characters, or
vehicles, and in specific situations, typically described in terms of well-
defined illumination, background, and pose of the object relative to the
camera.
Different varieties of the recognition problem are described in the
literature:
Object recognition one or several pre-specified or learned
objects or object classes can be recognized, usually together with their 2D
positions in the image or 3D poses in the scene. Google provides a stand-
alone program illustration of this function.
Identification an individual instance of an object is recognized.
Examples include identification of a specific person's face or fingerprint, or
identification of a specific vehicle.
Detection the image data are scanned for a specific condition.
Examples include detection of possible abnormal cells or tissues in medical
images or detection of a vehicle in an automatic road toll system. Detection
based on relatively simple and fast computations is sometimes used for

finding smaller regions of interesting image data which can be further
analyzed by more computationally demanding techniques to produce a
correct interpretation.
Several specialized tasks based on recognition exist, such as:
Content-based image retrieval finding all images in a larger set
of images which have a specific content. The content can be specified in
different ways, for example in terms of similarity relative a target image
(give me all images similar to image X), or in terms of high-level search
criteria given as text input (give me all images which contains many
houses, are taken during winter, and have no cars in them).
Pose estimation estimating the position or orientation of a
specific object relative to the camera. An example application for this
technique would be assisting a robot arm in retrieving objects from a
conveyor belt in an assembly line situation or picking parts from a bin.
Optical character recognition (OCR) identifying characters in
images of printed or handwritten text, usually with a view to encoding the
text in a format more amenable to editing or indexing (e.g. ASCII).
2D Code reading Reading of 2D codes such as data
matrix and QR codes.
Facial recognition-A facial recognition system is a computer
application for automatically identifying or verifying a person from a digital
image or a video frame from a video source. One of the ways to do this is
by comparing selected facial features from the image and a facial database.

5.2 Motion analysis
Several tasks relate to motion estimation where an image sequence is
processed to produce an estimate of the velocity either at each point in the
image or in the 3D scene, or even of the camera that produces the images.
Examples of such tasks are:

Ego motion determining the 3D rigid motion (rotation and
translation) of the camera from an image sequence produced by the
camera.
Tracking following the movements of a (usually) smaller set of
interest points or objects (e.g., vehicles or humans) in the image sequence.
Optical flow to determine, for each point in the image, how
that point is moving relative to the image plane, i.e., its apparent motion.
This motion is a result both of how the corresponding 3D point is moving in
the scene and how the camera is moving relative to the scene.

5.3 Scene reconstruction
Given one or (typically) more images of a scene, or a video, scene
reconstruction aims at computing a 3D model of the scene. In the simplest
case the model can be a set of 3D points. More sophisticated methods
produce a complete 3D surface model. The advent of 3D imaging not
requiring motion or scanning and related processing algorithms is enabling
rapid advances in this field. Grid-based 3D sensing can be used to acquire
3D images from multiple angles. Algorithms are now available to stitch
multiple 3D images together into point clouds and 3D models.
5.4 Image restoration
The aim of image restoration is the removal of noise (sensor noise,
motion blur, etc.) from images. The simplest possible approach for noise
removal is various types of filters such as low-pass filters or median filters.
More sophisticated methods assume a model of how the local image
structures look like, a model which distinguishes them from the noise. By
first analysing the image data in terms of the local image structures, such
as lines or edges, and then controlling the filtering based on local
information from the analysis step, a better level of noise removal is usually
obtained compared to the simpler approaches.


6 Computer vision system methods

The organization of a computer vision system is highly application
dependent. Some systems are stand-alone applications which solve a
specific measurement or detection problem, while others constitute a sub-
system of a larger design which, for example, also contains sub-systems for
control of mechanical actuators, planning, information databases, man-
machine interfaces, etc. The specific implementation of a computer vision
system also depends on if its functionality is pre-specified or if some part of
it can be learned or modified during operation. Many functions are unique to
the application. There are, however, typical functions which are found in
many computer vision systems.

6.1 Image acquisition
A digital image is produced by one or several image sensors, which, besides
various types of light-sensitive cameras, include range sensors, tomography
devices, radar, ultra-sonic cameras, etc. Depending on the type of sensor,
the resulting image data is an ordinary 2D image, a 3D volume, or an
image sequence. The pixel values typically correspond to light intensity in
one or several spectral bands (gray images or colour images), but can also
be related to various physical measures, such as depth, absorption or
reflectance of sonic or electromagnetic waves, or nuclear magnetic
resonance.[9]
Pre-processing Before a computer vision method can be applied
to image data in order to extract some specific piece of information, it is
usually necessary to process the data in order to assure that it satisfies
certain assumptions implied by the method. Examples are

Re-sampling in order to assure that the image coordinate system
is correct.
Noise reduction in order to assure that sensor noise does not
introduce false information.
Contrast enhancement to assure that relevant information can be
detected.
Scale space representation to enhance image structures at locally
appropriate scales.

6.2 Feature extraction
Image features at various levels of complexity are extracted from the image
data.[9] Typical examples of such features are
Lines, edges and ridges.
Localized interest points such as corners, blobs or points.
More complex features may be related to texture, shape or motion.
In computer vision and image processing the concept of feature
detection refers to methods that aim at computing abstractions of image
information and making local decisions at every image point whether there
is an image feature of a given type at that point or not. The resulting
features will be subsets of the image domain, often in the form of isolated
points, continuous curves or connected regions.
What is a feature?
There is no universal or exact definition of what constitutes a feature,
and the exact definition often depends on the problem or the type of
application. Given that, a feature is defined as an "interesting" part of
an image, and features are used as a starting point for many computer
vision algorithms. Since features are used as the starting point and main
primitives for subsequent algorithms, the overall algorithm will often only

be as good as its feature detector. Consequently, the desirable property for
a feature detector is repeatability: whether or not the same feature will be
detected in two or more different images of the same scene.
Feature detection is a low-level image processing operation. That is, it
is usually performed as the first operation on an image, and examines
every pixel to see if there is a feature present at that pixel. If this is part of
a larger algorithm, then the algorithm will typically only examine the image
in the region of the features. As a built-in pre-requisite to feature detection,
the input image is usually smoothed by a Gaussian kernel in a scale-space
representation and one or several feature images are computed, often
expressed in terms of local derivative operations.
Occasionally, when feature detection is computationally expensive and
there are time constraints, a higher level algorithm may be used to guide
the feature detection stage, so that only certain parts of the image are
searched for features.
Many computer vision algorithms use feature detection as the initial
step, so as a result, a very large number of feature detectors have been
developed. These vary widely in the kinds of feature detected, the
computational complexity and the repeatability.

6.3 Types of features:

Edges
Edges are points where there is a boundary (or an edge) between two
image regions. In general, an edge can be of almost arbitrary shape, and
may include junctions. In practice, edges are usually defined as sets of
points in the image which have a strong gradient magnitude. Furthermore,
some common algorithms will then chain high gradient points together to
form a more complete description of an edge. These algorithms usually

place some constraints on the properties of an edge, such as shape,
smoothness, and gradient value.
Locally, edges have a one dimensional structure.

Corners / interest points
The terms corners and interest points are used somewhat
interchangeably and refer to point-like features in an image, which have a
local two dimensional structure. The name "Corner" arose since early
algorithms first performed edge detection, and then analysed the edges to
find rapid changes in direction (corners). These algorithms were then
developed so that explicit edge detection was no longer required, for
instance by looking for high levels of curvature in the image gradient. It
was then noticed that the so-called corners were also being detected on
parts of the image which were not corners in the traditional sense (for
instance a small bright spot on a dark background may be detected). These
points are frequently known as interest points, but the term "corner" is used
by tradition.

Blobs / regions of interest or interest points
Blobs provide a complementary description of image structures in
terms of regions, as opposed to corners that are more point-like.
Nevertheless, blob descriptors often contain a preferred point (a local
maximum of an operator response or a centre of gravity) which means that
many blob detectors may also be regarded as interest point operators. Blob
detectors can detect areas in an image which are too smooth to be detected
by a corner detector.
Consider shrinking an image and then performing corner detection.
The detector will respond to points which are sharp in the shrunk image,

but may be smooth in the original image. It is at this point that the
difference between a corner detector and a blob detector becomes
somewhat vague. To a large extent, this distinction can be remedied by
including an appropriate notion of scale. Nevertheless, due to their response
properties to different types of image structures at different scales, the Log and
Doha blob detectors are also mentioned in the article on corner detection.
In the field of computer vision, blob detection refers to mathematical
methods that are aimed at detecting regions in a digital image that differ in
properties, such as brightness or colour, compared to areas surrounding
those regions. Informally, a blob is a region of a digital image in which
some properties are constant or vary within a prescribed range of values;
all the points in a blob can be considered in some sense to be similar to
each other.
Given some property of interest expressed as a function of position on
the digital image, there are two main classes of blob detectors:
(I) differential methods are based on derivatives of the function with
respect to position, and (ii) methods based on local extremes are based on
finding the local maxima and minima of the function. With the more recent
terminology used in the field, these detectors can also be referred to
as interest point operators, or alternatively interest region operators.
Why do we use blob detectors?
There are several motivations for studying and developing blob
detectors. One main reason is to providecomplementary information about
regions, which is not obtained from edge detectors or corner detectors. In
early work in the area, blob detection was used to obtain regions of interest
for further processing. These regions could signal the presence of objects or
parts of objects in the image domain with application to object
recognition and or object tracking. In other domains, such as histogram
analysis, blob descriptors can also be used for peak detection with
application to segmentation. Another common use of blob descriptors is as

main primitives for texture analysis and texture recognition. In more recent
work, blob descriptors have found increasingly popular use as interest
points for wide baseline stereo matching and to signal the presence of
informative image features for appearance-based object recognition based
on local image statistics.

Ridges
For elongated objects, the notion of ridges is a natural tool. A ridge
descriptor computed from a grey-levelimage can be seen as a
generalization of a medial axis. From a practical viewpoint, a ridge can be
thought of as a one-dimensional curve that represents an axis of symmetry,
and in addition has an attribute of local ridge width associated with each
ridge point. Unfortunately, however, it is algorithmically harder to extract
ridge features from general classes of grey-level images than edge-, corner-
or blob features. Nevertheless, ridge descriptors are frequently used for
road extraction in aerial images and for extracting blood vessels in medical
images

6.4 Detection/segmentation
At some point in the processing a decision is made about which image
points or regions of the image are relevant for further
processing.[9] Examples are
Selection of a specific set of interest points
Segmentation of one or multiple image regions which contain a
specific object of interest.


6.5 High-level processing
At this step the input is typically a small set of data, for example a set of
points or an image region which is assumed to contain a specific
object.[9] The remaining processing deals with, for example:
Verification that the data satisfy model-based and application
specific assumptions.
Estimation of application specific parameters, such as object
poses or objects size.
Image recognition classifying a detected object into different
categories.
Image registration comparing and combining two different
views of the same object. Image registration is the process of transforming
different sets of data into one coordinate system. Data may be multiple
photographs, data from different sensors, from different times, or from
different viewpoints.[1] It is used in computer vision, medical imaging,
military automatic target recognition, and compiling and analyzing images
and data from satellites. Registration is necessary in order to be able to
compare or integrate the data obtained from these different measurements.

6.6 Decision making
Making the final decision required for the application, [9] for example:
Pass/fail on automatic inspection applications
Match / no-match in recognition applications

Flag for further human review in medical, military, security and
recognition applications

7 Tracking

Estimation and tracking of motion in image sequences is a well-
established branch of computer vision. Real world scenes with large, rigid or
deformable moving objects are usually considered.
There are two main steps for tracking:
1. Object detection and segmentation
2. Object tracking of detected object in 1st step

A clear boundary can be drawn between the definitions of
these terms but not between their methods
A Method of Detection may do the tracking part for you.
A Method of tracking may do the Detection part for you.
7.1 Object detection and Segmentation
At some point in the processing a decision is made about which image
points or regions of the image are relevant for further processing. Examples
are
Selection of a specific set of interest points
Segmentation of one or multiple image regions which contain a
specific object of interest.


To start the tracking process, the first thing to do is to detect feature
points/Objects as a whole in an initial frame. This process can also be
termed as Segmentation/ Object Detection.

SOME METHODS OF OBJECT DETECTION -
BACKGROUND SUBTRACTION
CONTOUR EXTRACTION
TEMPLATE MATCHING

7.2 TRACKING OF DETECTED OBJECT/POINTS
Tracking literally implies that we are able to figure out the location of
a particular object in (not so real) Frame (Exact position in 3D world is
probably our next task: Calibration).
After detecting/segmenting feature points/objects out of an initial
frame, we try to track these points in the next frame. (This is strict
trackingNot a universally recognized word)
We must find where these points are now located in this new frame.
Obviously, since we are dealing with a video sequence, there is a
good chance that the object on which the feature points are found has
moved (the motion can also be due to camera motion).
Using some method, you get the feature in new frame (Using
information about its previous location in initial frame etc.).
And so, you have tracked the object/particular feature points
successfully.


SOME METHODS OF OBJECT TRACKING-

1. Optical flow
2. Obtaining its position/location in Frame), andAllocating some
kind of pointer (Circle at Centre or Bounding Rectangle using the
Location.

7.3 Background subtraction
Background subtraction is a technique in the fields of image
processing and computer vision wherein an image's foreground is extracted
for further processing (object recognition etc.). Generally an image's
regions of interest are objects (humans, cars, text etc.) in its foreground.
After the stage of image pre-processing (which may include image
demising etc.) object localisation is required which may make use of this
technique. Background subtraction is a widely used approach for detecting
moving objects in videos from static cameras. The rationale in the approach
is that of detecting the moving objects from the difference between the
current frame and a reference frame, often called background image, or
background model.[1] Background subtraction is mostly done if the image
in question is a part of a video stream.

7.4 Contour Extraction

Contour Extraction Also known as border following or boundary following;
contour tracing is a technique that is applied to digital images in order to
extract their boundary.

For explain Contour Extraction we first have to define a digital image.
A digital image is a group of pixels on a square tessellation each having a
certain value. Throughout this web site, we will be dealing with bi-
level images i.e. each pixel can have one of 2 possible values namely:
1, in which case we'll consider it a "black" pixel and it will be part of
the pattern, OR
0, in which case we'll consider it a "white" pixel and it will be part of
the background.

The boundary of a given pattern P is the set of border pixels of P.
since we are using a square tessellation; there are 2 kinds of boundary
(or border) pixels: 4-border pixels and 8-border pixels.
A black pixel is considered a 4-border pixel if it shares an edge with at least
one white pixel. On the other hand, a
black pixel is considered an 8-border pixel if it shares an edge or a
vertex with at least one white pixel.
(A 4-border pixel is also an 8-border pixel. An 8-border pixel may or may
not be a 4-border pixel.)
A point worth mentioning is that it is not enough to
merely identify the boundary pixels of a pattern in order to extract its
contour. What we need is an ordered sequence of the boundary pixels from
which we can extract the general shape of the pattern.

Why would we want to extract the contour of a given
pattern?
Contour tracing is one of many pre-processing techniques performed on
digital images in order to extract information about their general shape.
Once the contour of a given pattern is extracted, its different characteristics

will be examined and used as features which will later on be used in pattern
classification. Therefore, correct extraction of the contour will produce more
accurate features which will increase the chances of correctly classifying a
given pattern.
But you might be wondering: Why waste precious computational time
on first extracting the contour of a pattern and then collecting its features?
Why not collect features directly from the pattern?
Well, the contour pixels are generally a small subset of the total number of
pixels representing a pattern. Therefore, the amount of computation is
greatly reduced when we run feature extracting algorithms on the contour
instead of on the whole pattern. Since the contour shares a lot of features
with the original pattern, the feature extraction process becomes much
more efficient when performed on the contour rather on the original
pattern.
In conclusion, contour tracing is often a major contributor to the efficiency
of the feature extraction process -an essential process in the field of pattern
recognition.

Contours are sequences of points defining a line/curve in an image
Contour Extraction is one of the methods that deal with Object
Detection.
Every frame has some objects in it. Out of those objects, only
some are important (Rest is noise).
Eliminate that noise (As you dont want its boundary which we
will found in next step).
Now your frame is full of objects which matter.
Each of those objects has a boundary or so. Using contour
extraction, that boundary can be extracted.

Once I have got that boundary, I would say to have segmented
(detected) the Object (Not the complete object only its boundary, it
matters little though).
We will play with the (objects) boundary.
We will use it to track (By adding some sort of pointer).

CONTOUR EXTRACTION
(From a coders perspective)
Video File = several frames (Images) played at a certain frequency
Objective is to
Grab a frame out of a video file,
Obtain Contours of objects (In that frame),
Find the centre of that contour,
Draw a circle with centre = centre of contour (Kind of Pointer).
Process will repeat for every frame (Being in a loop)
As the frames are displayed at a certain frequency, we will see the
moving pointer (i.e. Circle).
AND THE OBJECT IS SAID TO BE TRACKED

7.5 Template matching
Template matching is a technique in digital image processing for
finding small parts of an image which match a template image. It can be

used in manufacturing as a part of quality control,a way to navigate a
mobile robot, or as a way to detect edges in images.

A basic method of template matching uses a convolution mask
(template), tailored to a specific feature of the search image, which we
want to detect. This technique can be easily performed on grey images
or edge images. The convolution output will be highest at places where the
image structure matches the mask structure, where large image values get
multiplied by large mask values.
This method is normally implemented by first picking out a part of the
search image to use as a template: We will call the search image S(x, y),
where (x, y) represent the coordinates of each pixel in the search image.
We will call the template T(x t, y t), where (xt, yt) represent the
coordinates of each pixel in the template. We then simply move the center
(or the origin) of the templateT(x t, y t) over each (x, y) point in the search
image and calculate the sum of products between the coefficients in S(x,
y) and T(xt, yt) over the whole area spanned by the template. As all
possible positions of the template with respect to the search image are
considered, the position with the highest score is the best position. This
method is sometimes referred to as 'Linear Spatial Filtering' and the
template is called a filter mask.
For example, one way to handle translation problems on images,
using template matching is to compare the intensities of the pixels, using
the SAD (Sum of absolute differences) measure.
A pixel in the search image with coordinates (xs, ys) has
intensity Is(xs, ys) and a pixel in the template with coordinates (xt, yt) has
intensity It(xt, yt ). Thus the absolute difference in the pixel intensities is
defined as Diff(xs, ys, x t, y t) = | Is(xs, ys) It(x t, y t) |.

The mathematical representation of the idea about looping through
the pixels in the search image as we translate the origin of the template at
every pixel and take the SAD measure is the following:

Srows and Scols denote the rows and the columns of the search image
and Trows and Tcols denote the rows and the columns of the template
image, respectively. In this method the lowest SAD score gives the estimate
for the best position of template within the search image. The method is
simple to implement and understand, but it is one of the slowest methods.

8 How we implemented
We implemented tracking algorithm using surf algorithm which uses
feature extraction and applied fuzzy logic for efficient implementation. To
start the tracking process, the first thing to do is to detect feature
points/Objects as a whole in an initial frame. After which fuzzy sets are
implemented for further improving its efficiency.

8.1 Speeded Up Robust Features

SURF (Speeded Up Robust Features) is a robust local feature detector,
first presented by Herbert Bay et al. in 2006, that can be used in computer
vision tasks like object recognition or 3D reconstruction. It is partly inspired
by the SIFT descriptor. The standard version of SURF is several times faster

than SIFT and claimed by its authors to be more robust against different
image transformations than SIFT. SURF is based on sums of 2D Haar
wavelet responses and makes an efficient use of integral images.
It uses an integer approximation to the determinant of Hessian blob
detector, which can be computed extremely quickly with an integral image
(3 integer operations). For features, it uses the sum of the Haar wavelet
response around the point of interest. Again, these can be computed with
the aid of the integral image.
It approximates or even outperforms previously proposed schemes
with respect to repeatability, distinctiveness, and robustness, yet can be
computed and compared much faster. This is achieved by relying on
integral images for image convolutions - building on the strengths of the
leading existing detectors and descriptors (using a Hessian matrix-based
measure for the detector, and a distribution-based descriptor) - simplifying
these methods to the essential This leads to a combination of novel
detection, description, and matching steps.
SURF is patented in US but is available for non-commercial use.

SURF performs task like:
1) Object recognition
2) 3D reconstruction

SURF is based on :
1) sums of 2D Haar wavelet responses
2) efficient usage of integral images
3) Integer approx of the determinant of Hessian blob detector

OBJECT RECOGNITION

Object Recognition is the task of finding a given object in an image or
video sequence.
Humans recognize a multitude of objects in images with little effort,
despite the fact that the image of the objects may vary somewhat in
different viewpoints, in many different sizes / scale or even when they are
translated or rotated. Objects can even be recognized when they are
partially obstructed from view. This task is still a challenge for computer
vision systems in general.

Object Recognition can be done in 2 ways:

Appearance Based methods
It uses example images (called templates or exemplars) of the objects
to perform recognition. Objects look different under varying conditions:
Changes in lighting or colour
Changes in viewing direction
Changes in size / shape
A single exemplar is unlikely to succeed reliably. However, it is
impossible to represent all appearances of an object.
Appearance based methods can be listed as :
Edge matching

Divide-and-Conquer search
Greyscale matching
Gradient matching
Histograms of receptive field responses
Large model bases

Feature based methods
A search is used to find feasible matches between object features and
image features .The primary constraint is that a single position of the object
must account for all of the feasible matches. Methods that extract features
from the objects to be recognized and the images to be searched.
surface patches
corners
linear edges

Feature based method can be listed as:

1. Interpretation trees
2. Hypothesize and test
3. Pose consistency
4. Pose clustering
5. Invariance
6. Geometric hashing
7. Scale-invariant feature transform (SIFT)
8. Speeded Up Robust Features (SURF)

3D Reconstruction

3D reconstruction is the process of capturing the shape and
appearance of real objects. This process can be accomplished either by
active or passive methods. If the model is allowed to change its shape in
time, this is referred to as non-rigid or spatio-temporal reconstruction.
There are 2 ways of 3-D reconstruction:
Active Ways

These methods actively interfere with the reconstructed object, either
mechanically or radio metrically. A simple example of a mechanical method
would use a depth gauge to measure a distance to a rotating object put on
a turntable. More applicable radiometric methods emit radiance towards the
object and then measure its reflected part. Examples range from moving
light sources, coloured visible light time-of-
flight lasers to microwaves or ultrasound. See 3D scanning for more details.

Passive Ways

Passive methods of 3D reconstruction do not interfere with the
reconstructed object, they only use a sensor to measure the radiance
reflected or emitted by the object's surface to infer its 3D structure.
Typically, the sensor is an image sensor in a camera sensitive to visible light
and the input to the method is a set of digital images (one, two or more)

or video. In this case we talk about image-based reconstruction and the
output is a 3D model.

The algorithm
As the name suggests, the value at any point (x, y) in the summed
area table is just the sum of all the pixels above and to the left of
(x, y),
Moreover, the summed area table can be computed efficiently in a
single pass over the image, using the fact that the value in the summed
area table at (x, y) is just:

Hessian Blob Detector

By considering the scale-normalized determinant of the Hessian, also
referred to as the MongeAmpre operator,


where denotes the Hessian matrix of and then detecting scale-
space maxima of this operator one obtains another straightforward
differential blob detector with automatic scale selection which also responds
to saddles (Lundeberg 1994, 1998)

The blob points and scales are also defined from an
operational differential geometric definitions that leads to blob descriptors
that are covariant with translations, rotations and rescalings in the image
domain. In terms of scale selection, blobs defined from scale-space extrema
of the determinant of the Hessian (DoH) also have slightly better scale
selection properties under non-Euclidean affine transformations than the
more commonly used Laplacian operator (Lindeberg 1994, 1998). In
simplified form, the scale-normalized determinant of the Hessian computed
from Haar wavelets is used as the basic interest point operator in
the SURF descriptor (Bay et al. 2006) for image matching and object
recognition.

8.2 FUZZY LOGIC
Fuzzy logic
Fuzzy logic is a form of many-valued logic or probabilistic logic; it
deals with reasoning that is approximate rather than fixed and exact.
Compared to traditional binary sets (where variables may take on true or
false values) fuzzy logic variables may have a truth value that ranges in
degree between 0 and 1. Fuzzy logic has been extended to handle the

concept of partial truth, where the truth value may range between
completely true and completely false. Furthermore, when linguistic variables
are used, these degrees may be managed by specific functions. Irrationality
can be described in terms of what is known as the fuzzjective.
The term "fuzzy logic" was introduced with the 1965 proposal of fuzzy
set theory by Lotfi A. Zadeh. Fuzzy logic has been applied to many fields,
from control theory to artificial intelligence. Fuzzy logics however had been
studied since the 1920s as infinite-valued logics notably by ukasiewicz and
Tarski.
Classical logic only permits propositions having a value of truth or
falsity. The notion of whether 1+1=2 is absolute, immutable, mathematical
truth. However, there exist certain propositions with variable answers, such
as asking various people to identify a colour. The notion of truth doesn't fall
by the wayside, but rather a means of representing and reasoning over
partial knowledge is afforded, by aggregating all possible outcomes into a
dimensional spectrum.
Both degrees of truth and probabilities range between 0 and 1 and
hence may seem similar at first. For example, let a 100 ml glass contain 30
ml of water. Then we may consider two concepts: Empty and Full. The
meaning of each of them can be represented by a certain fuzzy set. Then
one might define the glass as being 0.7 empty and 0.3 full. Note that the
concept of emptiness would be subjective and thus would depend on the
observer or designer. Another designer might equally well design a set
membership function where the glass would be considered full for all values
down to 50 ml. It is essential to realize that fuzzy logic uses truth degrees
as a mathematical model of the vagueness phenomenon while probability is
a mathematical model of ignorance.
The Japanese were the first to utilize fuzzy logic for practical
applications. The first notable application was on the high-speed train in
Sendai, in which fuzzy logic was able to improve the economy, comfort, and
precision of the ride. It has also been used in recognition of hand written

symbols in Sony pocket computers, Canon auto-focus technology[citation
needed], Omron auto-aiming cameras earthquake prediction and modeling
at the Institute of Seismology Bureau of Metrology in Japan, etc.
Fuzzy logic and probability are different ways of expressing
uncertainty. While both fuzzy logic and probability theory can be used to
represent subjective belief, fuzzy set theory uses the concept of fuzzy
set membership (i.e., how much a variable is in a set), and probability
theory uses the concept of subjective probability (i.e., how probable do I
think that a variable is in a set). While this distinction is mostly
philosophical, the fuzzy-logic-derived possibility measure is inherently
different from the probability measure, hence they are
not directly equivalent. However, manystatisticians are persuaded by the
work of Bruno de Finite that only one kind of mathematical uncertainty is
needed and thus fuzzy logic is unnecessary. On the other hand, Bart
Koskoargues[citation needed] that probability is a subtheory of fuzzy logic,
as probability only handles one kind of uncertainty. He also claims[citation
needed] to have proven a derivation of Bayes' theoremfrom the concept of
fuzzy subsethood. Lotfi A. Zadeh argues that fuzzy logic is different in
character from probability, and is not a replacement for it. He fuzzified
probability to fuzzy probability and also generalized it to what is
called possibility theory. More generally, fuzzy logic is one of many different
proposed extensions to classical logic, known as probabilistic logics,
intended to deal with issues of uncertainty in classical logic, the
inapplicability of probability theory in many domains, and the paradoxes
of Dempster-Shafer theory.
With the increasing need for flexibility and adaptively in computerized
systems, the application of fuzzy expert systems is becoming increasingly
commonplace in today's industry. Fuzzy logic expert systems often improve
performance by allowing knowledge to generalize without requiring the
knowledge engineer to anticipate all possible situations. Thus, for many
types of applications, "soft computing" such as Fuzzy logic can incur lower

overhead in terms of representing and engineering task knowledge. Our
project investigates the application of fuzzy expert systems to motion
tracking. Previous research showed that Fuzzy Logic can be used to track
the motion of a brightly collared object against a dark background, with
relatively low development and run-time costs. The system we are
developing identifies, tracks, and predicts the motion of multiple objects
using unique identification patterns against a dark background. An essential
step to obtaining the fuzzy inputs for the motion tracking fuzzy inference
system is to use convolution correlation data to obtain the centres of mass
of the objects. Image processing information from the region around the
center of each object provides good fuzzy inputs for recognizing object
patterns and determining orientation. We are investigating two fuzzy
inference systems for motion tracking and prediction in order to identify
their strengths and weaknesses. In the long run, we hope to exploit the
strengths of both by dividing the task between the two approaches..
Cascading the two fuzzy systems allows the system first to acquire
generalized patterns and then fine-tune them for error tolerance. To
perform this work, we are using the openCV fuzzy logic toolkit, which is a
simple and flexible tool for designing fuzzy systems.
In Computer Vision, the low level processes extract information of the
pixel intensity of an image. Using this information those regions with similar
characteristics are searched to detect objects that can be relevant to a
concrete application. Object tracking throughout a video sequence can be
seen like a process divided into two phases:
(1) an object segmentation in each frame is done,
(2) the correspondences between objects are established frame to
frame.
This work is different in one aspects with classical techniques of
computer vision. Our segmentation process is based in approximate
reasoning techniques applied to the motion data of a MPEG video sequence.

The application of fuzzy logic in the system allows the machine
learning of the developed system by dynamically changing the threshold
levels. This dynamic adaptation of the system allows the system to adapt in
different circumstances like sudden change in brightness level of the frame
due to switching on or switching off of the light, or the periodic variation of
the brightness level of the pixel.

In the past few decades, detection and tracking of people, vehicles
and generally objects in multifarious environment become one of the most
attractive topics of scientific researches and papers. Various applications of
these kind such as humancomputer interaction, ambient intelligent
systems , robotics ,video communication/compression , and etc are the
reason for growing interest in people and object tracking.

Generally, several methods have been proposed for the solving the
problem of object tracking. Some of them
employ a cost function minimization to locate the object in
subsequent frames. In colour and edge of object are utilized as features for
tracking. In some methods shape and skin colour are used for the tracking
task. Using of stereo information for people tracking is presented in which,
a particle filter employs depth information of object for tracking.

In our method particle filter with the aid of a special fuzzy based
model for image is used for tracking of objects. This model makes
discrimination between background and the object by enlightening the
dominant elements of the object in the scene. Moreover, a method for
model updating is proposed to deal with the changes in aspect and
illumination condition.

This algorithm could be discerned in two main stages which are
prediction and update. In the prediction stage, with considering the

favourable region in the video frame and its state model, the particles
(sampling window) are modified. In the next stage, known as update, based
on new data, weight of each particle is re -evaluated.

Prediction step:
For estimating the state of a dynamic system from sequential
observations, particle filtering techniques are used. Dealing with systems in
which observation is non-Gaussian, is the reason of attraction of particle
filtering for utilizing in such tracking tasks. For solving the problem of
multiple target tracking, multiple filters are employed. In which, for each
person independent trackers are used. In particle filter is designed base on
multiple cues which are learnt from the first frame and adaptive
parameters.

Update step:
In this step the new frame is accommodated in the statistics model of
the frame. This is done via some recursion formulae as described in the
algorithm of the model shown. This helps in dynamically updating and
paving steps for further steps of the various parameters initialised in the
algorithm.

The following is the algorithm we have implemented to achieve our
desired result:


Block Diagram

START
Create an array of matrix of the same size as that of
the pixels in each frame
Capture and save first five seconds worth of
frames
Define global variables
Double mean[][]=0, varience[][]=0;

FIND THE VARIENCE OF EACH POINT IN THE PIXEL
Double std_dev( double a[], int n)
{ double sum=0, sq_sum=0;
for( int i=0; i<n; ++i)
{ sum+=a[i];
Sq_sum+=a[i]*a[i];
}
Double varience= sq_sum/n mean*mean;
Return (sqrt(varience)/mean);
}

Create another matrix(named temp_val) of the same size
and fill it with the following values (which have been
obtained by defining fuzzy sets)
Temp_val = (value returned by std_dev) x e^((value
returned by std_dev) 0.15 )
Create another matrix (called prev_val ) and fill it with
the following values
Prev_val = (mean)*(temp_val)

Capture the current frame in the following matrix
Current_frame

8.5 Output:

Now update the values of the global variables as
following
Mean = 114 / 115 x mean + 1 / 115 x current_frame
Varience = (sqrt(114/115 x (prev_val/mean )^2 + 1/115 x
current_frame - mean)^2)/115)
If varience
> 0.33
Dont mark
Mark as change





9. Advantages

Allows imprecise/contradictory inputs
As is suggested in the name this implies the application of this
type of logic is highly recommended when the distinguished inputs are not
available like that of use in linguistic variables.

Permits fuzzy thresholds
Due to imprecise fuzzy inputs the system has to operate on these
imprecise values. These imprecise values reflect in the final result which
now can not be precise. Thus the fuzzy logy permits imprecise thresholds.

Reconciles conflicting objectives
As is the basic theory of fuzzy logic suggests as there are no
clearly defined threshold levels the fuzzy logic can accommodate a method
of achieving an optimisation between the conflicting objectives.

Rule base or fuzzy sets easily modified

Relates input to output in linguistic terms, easily understood

o As the fuzzy logic is imprecise the logic cannot be described in
terms of the mathematical equations but only the linguistic terms.

Increased robustness

Simplify knowledge acquisition and representation

A few rules encompass great complexity

Can achieve less overshoot and oscillation

Can achieve steady state in a shorter time interval

10 Limitations
Hard to develop a model from a fuzzy system

Require more fine tuning and simulation before operational

Have a stigma associated with the word fuzzy

o engineers and most other people are used to crispness and shy
away from fuzzy control and fuzzy decision making Applications

Fuzzy feedback control of systems represents a less valuable
engineering approach which is used by people who have never made an
effort to learn the classical control theory. Fuzzy logic did not succeed in
offering an alternative approach when it comes to systems without the
known mathematical model which was its main purpose.
This reproach is not the most appropriate for the control of complex
systems, which has to fulfil harsh requirements concerning accuracy, speed,
stability, and other things, because setting up a rule base and its use
requires an in-depth knowledge of a physical background, system
modelling, and control basics, which is a part of the classical theory.
However, with the help of modern programme packages, which have
already completed fuzzy control methods included, one can successfully
design the fuzzy control of a very simple system even without any in-depth
knowledge about the classical theory. Since this saves a lot of engineering
time, it seems more appropriate to view it as an advantage of this method
than to use it as a reproach towards the designer.

It is impossible to prove the stability of the fuzzy control system.
When it comes to proofs which we can find in literature, stability is often
proved on a 'crisp' system which is only a deformed picture of the fuzzy,
while methods from the classical system theory are used.
As a response to this reproach, Madman stated in 1993 that there are
also many processes which are too complex to prove their stability in a
classical way. Mathematical proofs of stability are supposedly often
unnecessary because testing a prototype is much more important than a
mathematical analysis. Such approach has been used successfully for quite
some time in industry.

Despite this, the following fact remains - there are no unified
procedures developed which would prove the stability of fuzzy control
systems. Using the classical Lyapunov stability theory is quite common and
it is based on the analytical notation of a fuzzy system. In addition, in
literature we come across attempts to introduce fuzzy Lyapunov function,
which is a direct function of fuzzy variables in the years from 1999 until
now. However, this area is developing rapidly and the corresponding results
may be expected in the future.

There is no systematic approach to fuzzy system designing. Instead,
empirical ad-hoc approaches are used.
Although there is no systematic approach, we still have some
guidelines for choosing the fuzzification method, inferencing, and
defuzzification. Considering the experiences when it comes to designing the
structure and control parameters is a common practise also for the classical
approach.

Fuzzy control methods are suitable only for trivial problems which do
not require high accuracy. Practical implementations of the fuzzy control,
therefore, refer to highly damped low-level systems.

Fuzzy logic in its basic form is truly not appropriate for the control of a
highly complex system because usually there is not enough knowledge
about the system available. However, when using it in terms of a fuzzy
parameter setting or fuzzy system identification, we can considerably
improve the results of control according to classical methods. With the
introduction of adaptation, its use has spread also to complex systems with
unknown mathematical model.

Fuzzy systems are transparent (understandable) only for simple
problems.
This is true for a fuzzy logic system which is notated with all possible
rules. System transparency can be achieved also for many complex systems
when a fuzzy system is appropriately divided into subsystems, and when
composition methods and genetic algorithms are used.

Fuzzy logic systems are appropriate for use on higher levels of system
control ('decision making'), and not as an alternative to controllers on lower
levels, such as speed and position controllers.
If we have linguistic knowledge for the lowest level and classical
approaches turn out to be insufficient, then we can use fuzzy systems. Proof
for this is a set of successful industrial applications. This opinion is partly
conditioned with the fact that only a few years ago the computing of fuzzy
algorithms only for longer sampling times was possible, which restricted its
use to higher levels of control. Today's controllers with efficient processors
enable the computing of more complex algorithms also in times which are in
the size of magnitude of used sampling times for robot controllers (ms).

Statisticians represent the opinion that the probability theory is
enough to notate linguistic knowledge and that fuzzy logic is, thus, not
necessary.
Even though fuzzy membership functions can often have the same
form as the law of normal distribution curve (Gaussian curve), there is a

difference in the type of knowledge they represent. Thus, the distribution
curve represents a set of data, and fuzzy membership function represents a
word from the natural language. In addition, the ways in which the theories
are used are also completely different. The probability theory is used for
analysis of greater amount of data, while fuzzy logic is used for
representation and use of linguistic knowledge.

11 Future growth

Parallel processing of the different objects in the same frame is
expected to reduce the processing time in case of real time video feeds. The
real time video feeds generally contain not one but many simultaneously
moving objects. These objects can be tracked efficiently and at a better
speed if the calculations are processed by pipelining.

Colour space model

o A special fuzzy based model for colour can be employed including
a pre-processing on the frame which enlighten the colour element of the
object and reduce the background colour elements. First from the manually
tracked region of the target, the object and background colour information
is obtained. Then the first frame is converted to four colour space including
H, S, r and g where, the normalized red and green layers are B) +G +
/(RR =r and B) +G + (RG / = g in RGB space and H, S are layers of HSV
colour space. The median values of the object's pixels in each colour space
are calculated. This step concludes the object segmentation process which
now can be done in subsequent frames to track the object effectively.


12 Applications:
1. facial recognition system
A facial recognition system is a computer application for
automatically identifying or verifying a person from a digital image or
a video frame from a video source. One of the ways to do this is by
comparing selected facial features from the image and a facial database. In
addition to being used for security systems, authorities have found a
number of other applications for facial recognition systems. While earlier
post 9/11 deployments were well publicized trials, more recent deployments
are rarely written about due to their covert nature.
At Super Bowl XXXV in January 2001, police in Tampa Bay,
Florida used Viisage facial recognition software to search for potential
criminals and terrorists in attendance at the event. 19 people with minor
criminal records were potentially identified
2. Machine Vision
Machine vision (MV) is the technology and methods used to provide
imaging-based automatic inspection and analysis for such applications as
automatic inspection, process control, and robot guidance in industry. The
scope of MV is broad. MV is related to, though distinct from, computer
vision. The primary uses for machine vision are automatic inspection
and industrial robot guidance.[5] Common MV applications include quality
assurance, sorting, material handling, robot guidance, and optical gauging

3. Object Detection
Object detection is a computer technology related to computer
vision and image processing that deals with detecting instances of semantic
objects of a certain class (such as humans, buildings, or cars) in digital
images and videos. Well-researched domains of object detection
include face detection and pedestrian detection. Object detection has

applications in many areas of computer vision, including image
retrieval andvideo surveillance.

4. Video Surveillance
In industrial plants, CCTV equipment may be used to observe parts of
a process from a central control room, for example when the environment is
not suitable for humans. CCTV systems may operate continuously or only as
required to monitor a particular event. A more advanced form of CCTV,
utilizing digital video recorders(DVRs), provides recording for possibly many
years, with a variety of quality and performance options and extra features
(such as motion-detection and email alerts). More recently, decentralized
IP-based CCTV cameras, some equipped with megapixel sensors, support
recording directly to network-attached storage devices, or internal flash for
completely stand-alone operation. Surveillance of the public using CCTV is
particularly common in many areas around the world including the United
Kingdom, where there are reportedly more cameras per person than in any
other country in the world.

5. Medical Imaging
Medical imaging is the technique and process used to create images of
the human body (or parts and function thereof) for clinical purposes
(procedures seeking to reveal, diagnose, or examine disease) or medical
science (including the study of normal anatomy and physiology). Although
imaging of removed organsand tissues can be performed for medical
reasons, such procedures are not usually referred to as medical imaging,
but rather are a part of pathology.


13 References

[1] C. Colombo, AD. Bimbo, A. Valla, "Visual capture and
understanding of hand pointing actions in a 3-D environment", IEEE
Transactions on Systems, Man and CyberneticsPart B 33 (2003) 677686.

[2] T. Darrell, D. Demirdjian, N. Checka, P. Felzenszwalb, "Plan-view
trajectory estimation with dense stereo background models", Eighth IEEE
International Conference
on Computer Vision (ICCV 2001), vol. 2, 2001, pp. 628 635.

[3] G. Bradski, "Computer vision face tracking as a component of a
perceptual user interface", Workshop on Application ofComputer Vision,
Princeton, NJ, 1998, pp. 214219.

[4] B. Menser, M. Brunig, "Face detection and tracking for video
coding applications", Conference Record of the Thirty-Fourth Asilomar
Conference on Signals, Systems and
Computers, 2000, pp. 4953.

[5] W.E. Vieux, K. Schwerdt, J.L. Crowley." Face-tracking and coding
for video compression", International Conference on Computer Vision
Systems, 1999, pp. 151160.

Fuzzy Techniques For Video Surveillance

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Fuzzy Techniques For Video Surveillance

Transféré par

Droits d'auteur :

Formats disponibles

Fuzzy Techniques For Video Surveillance Page 1

Vous aimerez peut-être aussi