Vous êtes sur la page 1sur 53

Pedram Azad, Tilo Gockel, Rdiger Dillmann

Computer Vision
Principles and Practice

Pedram Azad, Tilo Gockel, Rdiger Dillmann

Computer Vision
Principles and Practice

Computer vision is probably the most exciting branch of image processing, and the number of applications in robotics, automation technology and quality control is constantly increasing. Unfortunately entering this research area is, as yet, not simple. Those who are interested must rst go through a lot of books, publications and software libraries. With this book, however, the rst step is easy. The theoretically well-founded content is understandable and is supplemented by many practical examples. Source code is provided with the specially developed platform-independent open source library IVT in the programming language C/C++. The use of the IVT is not necessary, but it does make for a much easier entry and allows rst developments to be quickly produced.

The authorship is made up of research assistants of the chair of Professor Rdiger Dillmann at the Institut fr Technische Informatik (ITEC), Universitt Karlsruhe (TH). Having gained extensive experience in image processing in many research and industrial projects, they are now passing this knowledge on. Among other subjects, the following are dealt with in the fundamentals section of the book: Lighting, optics, camera technology, transfer standards, camera calibration, image enhancement, segmentation, lters, correlation and stereo vision. The practical section provides the efcient implementation of the algorithms, followed by many interesting applications such as interior surveillance, bar code scanning, object recognition, 3D scanning, 3D tracking, a stereo camera system and much more.

ISBN 978-0-905705-71-2

Elektor Electronics www.elektor-electronics.co.uk

Pedram Azad, Tilo Gockel, Rdiger Dillmann u

Computer Vision Principles and Practice


1st Edition
April 4, 2008

Elektor International Media BV 2008

All rights reserved. No part of this book may be reproduced in any material form, including photocopying, or storing in any medium by electronic means and whether or not transiently or incidentally to some other use of this publication, without the written permission of the copyright holder except in accordance with the provisions of the Copyright, Designs and Patents Act 1988 or under the terms of a license issued by the Copyright Licensing Agency Ltd, 90 Tottenham Court Road, London, England W1P 9HE. Applications for the copyright holders written permission to reproduce any part of this publication should be addressed to the publishers. The publishers have used their best eorts in ensuring the correctness of the information contained in this book. They do not assume, and hereby disclaim, any liability to any party for any loss or damage caused by errors or omissions in this book, whether such errors or omissions result from negligence, accident or any other cause. British Library Cataloging in Publication Data A catalog record for this book is available from the British Library ISBN 978-0-905705-71-2 Translation: Adam Lockett Prepress production: Tilo Gockel First published in the United Kingdom 2008 Printed in the Netherlands by Wilco, Amersfoort Elektor International Media BV 2008 059018-UK 1st Edition 2007 in German Computer Vision Das Praxisbuch Elektor-Verlag GmbH 52072 Aachen

Contents

Part I Basics 1 Technical Fundamentals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Light . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.1 Physical Fundamentals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.2 Illuminati . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.3 Illumination Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.4 Summary and Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Optics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.1 Connection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.2 Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.3 Focal Distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.4 Aperture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.5 Focus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.6 Angle of View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.7 Minimal Object Distance . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.8 Depth of Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.9 Resolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.10 Summary and Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Image Sensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4.1 Physical Fundamentals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4.2 CCD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4.3 CMOS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4.4 Color Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4.5 Summary and Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5 Image Transfer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5.1 Analog Transfer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5.2 USB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5.3 IEEE 1394 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 13 14 14 18 20 26 27 28 29 29 31 32 32 33 34 35 37 40 40 40 43 46 48 49 51 51 52

Contents

1.5.4 Camera Link . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5.5 Gigabit-Ethernet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5.6 GenICam . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5.7 Bandwidth Requirement . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5.8 Driver Encapsulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5.9 Notebook Cameras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.6 System Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.6.1 Humanoid Robot Head . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.6.2 Stereo Endoscope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.6.3 Smart Room . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.6.4 Industrial Quality Control . . . . . . . . . . . . . . . . . . . . . . . . . . 1.7 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Introduction to the Algorithmics . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Camera Model and Camera Calibration . . . . . . . . . . . . . . . . . . . . 2.2.1 Pinhole Camera Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.2 Extended Camera Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.3 Camera Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.4 Consideration of Lens Distortions . . . . . . . . . . . . . . . . . . . 2.2.5 Summary of the Calibration Procedure . . . . . . . . . . . . . . . 2.3 Image Representation and Color Models . . . . . . . . . . . . . . . . . . . . 2.3.1 Representation of a 2D Image in Memory . . . . . . . . . . . . 2.3.2 Representation of Grayscale Images . . . . . . . . . . . . . . . . . . 2.3.3 Representation of Color Images . . . . . . . . . . . . . . . . . . . . . 2.3.4 Image Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.5 Conversion between Grayscale Images and Color Images 2.4 Homogeneous Point Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Histograms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.1 Grayscale Histograms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.2 Color Histograms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.3 Histogram Stretching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.4 Histogram Equalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.5 Comparison of Histogram Stretching and Histogram Equalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6 Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6.1 Convolution and Filters in the Spatial Domain . . . . . . . . 2.6.2 Filter Masks of common Filters . . . . . . . . . . . . . . . . . . . . . 2.6.3 Practical Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7 Morphological Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7.1 General Denition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7.2 Dilation and Erosion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7.3 Opening and Closing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.8 Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

53 53 54 55 56 56 57 57 59 60 62 64 67 67 67 68 69 72 78 81 83 84 84 85 89 89 90 93 93 95 96 98 99 101 101 102 106 109 109 110 111 113

Contents

2.9

2.10

2.11

2.12

2.13 3

2.8.1 Segmentation by Thresholding . . . . . . . . . . . . . . . . . . . . . . 2.8.2 Color Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.8.3 Region Growing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.8.4 Segmentation of Geometrical Structures . . . . . . . . . . . . . . Homography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.9.1 General Denition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.9.2 Bilinear Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.9.3 Examples of Specic Homographies . . . . . . . . . . . . . . . . . . 2.9.4 Least Squares Computation of Homography Parameters Stereo Geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.10.1 Stereo Triangulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.10.2 Epipolar Geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.10.3 Rectication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Correlation Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.11.1 General Denition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.11.2 Non-normalized Correlation Functions . . . . . . . . . . . . . . . 2.11.3 Normalized Correlation Functions . . . . . . . . . . . . . . . . . . . 2.11.4 Run-time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ecient Implementation of Image Processing Methods . . . . . . . 2.12.1 Image Access to 8 bit Grayscale Images . . . . . . . . . . . . . . 2.12.2 Image Access to 24 bit Color Images . . . . . . . . . . . . . . . . . 2.12.3 Homogeneous Point Operators . . . . . . . . . . . . . . . . . . . . . . 2.12.4 Placement of if Statements . . . . . . . . . . . . . . . . . . . . . . . . . 2.12.5 Memory Accesses and Cache Optimization . . . . . . . . . . . . 2.12.6 Arithmetic and Logical Operations . . . . . . . . . . . . . . . . . . 2.12.7 Lookup Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

113 116 120 123 131 131 132 133 135 137 137 139 141 143 143 144 144 147 148 148 149 150 151 151 153 154 155 157 157 158 158 159 160 161 161 162 162 163 163 163 164 164

Integrating Vision Toolkit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 The Class CByteImage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.2 Connection of Graphical User Interfaces . . . . . . . . . . . . . . 3.2.3 Connection of Image Sources . . . . . . . . . . . . . . . . . . . . . . . . 3.2.4 Integration of OpenCV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.5 Integration of OpenGL via Qt . . . . . . . . . . . . . . . . . . . . . . 3.3 Example Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.1 Use of Basic Functionality . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.2 Use of a Graphical User Interface . . . . . . . . . . . . . . . . . . . . 3.3.3 Use of a Camera Module . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.4 Use of OpenCV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.5 Use of the OpenGL Interface . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Overview of further IVT Functionality . . . . . . . . . . . . . . . . . . . . .

Contents

Part II Applications 4 Surveillance Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Segmentation of Motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Extensions and Related Approaches . . . . . . . . . . . . . . . . . . . . . . . . 4.4 References and Source Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bar 5.1 5.2 5.3 5.4 5.5 5.6 Codes and Matrix Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fundamentals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bar Code Structure (EAN13 Bar Code) . . . . . . . . . . . . . . . . . . . . Recognition of EAN13 Bar Codes . . . . . . . . . . . . . . . . . . . . . . . . . . Matrix Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References and Source Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 171 171 174 175 177 177 177 179 180 182 184 189 189 190 190 192 192 195 199 199 200 201 202 207 207 208 209 210 213 213 214 214 215 215 217 218 220

Workpiece Gauging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Algorithmics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.1 Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.2 Gauging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4 References and Source Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Histogram-based Object Recognition . . . . . . . . . . . . . . . . . . . . . . 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3 Operation of the Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4 References and Source Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Correlation-based Object Recognition . . . . . . . . . . . . . . . . . . . . . 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2 Automatic Cat Flap Flo Control . . . . . . . . . . . . . . . . . . . . . . . . . 8.3 Bottle Sorting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.4 References and Source Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Scale- and Rotation-Invariant Object Recognition . . . . . . . . . 9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2 Appearance-based approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.3 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.3.1 Undistortion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.3.2 Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.3.3 Normalization of the Shape . . . . . . . . . . . . . . . . . . . . . . . . . 9.3.4 Classication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.4 References and Source Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Contents

10 Laser Scanning using the Light-Section Method . . . . . . . . . . . 10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.2 Fundamentals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.3 Geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.4 Algorithmics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.5 Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.5.1 Calibration Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.5.2 Scan Procedure and Visualization . . . . . . . . . . . . . . . . . . . 10.6 Accuracy Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.7 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.8 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.8.1 Text Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.8.2 Other Interesting 3D Scanner Projects . . . . . . . . . . . . . . . 10.8.3 Software for Processing the 3D Data . . . . . . . . . . . . . . . . . 10.9 Parts List, CAD and Source Code . . . . . . . . . . . . . . . . . . . . . . . . . 11 Depth Image Acquisition with a Stereo Camera System . . . 11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.3 References and Source Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3D Tracking with a Stereo Camera System . . . . . . . . . . . . . . . . 12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.2 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.3 References and Source Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.2 Human Motion Capture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.3 3D Object Recognition and Localization . . . . . . . . . . . . . . . . . . . . 13.4 Biometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.4.1 Iris Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.4.2 Fingerprint Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.5 Optical Character Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.6 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Part III Appendix A Installation of IVT, OpenCV and Qt under Windows and Linux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.1 Windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.1.1 OpenCV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.1.2 Qt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.1.3 CMU1394 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

225 225 225 227 229 232 232 234 236 237 238 238 240 241 242 251 251 251 254 259 259 259 262 267 267 267 268 269 269 270 270 271

275 276 276 278 279

Contents

A.1.4 A.1.5 A.2 Linux A.2.1 A.2.2 A.2.3 A.2.4 B

IVT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .................................................. OpenCV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Qt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Firewire and libdc1394/libraw1394 . . . . . . . . . . . . . . . . . . IVT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

280 284 285 285 285 286 286 289 289 289 290 290 290 291 291 292 294 294 295 296

Mathematics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.1 Vector Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.1.1 Vector Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.1.2 Inverting a 33-Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.1.3 Straight Lines in R3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.1.4 Planes in R3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.1.5 Intersection of a Straight Line with a Plane . . . . . . . . . . . B.1.6 Rotations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.1.7 Homogeneous Coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . B.2 Numerics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.2.1 Method of Least Squares . . . . . . . . . . . . . . . . . . . . . . . . . . . B.2.2 Gauss Elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.2.3 Cholesky Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . Industrial Image Processing A Practical Experience Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . C.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C.2 Fundamentals of the EyeVision Software . . . . . . . . . . . . . . . . . . . C.3 Test Run . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C.4 Component Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C.4.1 Line-scan versus Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . C.4.2 Process Interfacing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C.5 Projects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C.5.1 Automatic Pretzel Cutter . . . . . . . . . . . . . . . . . . . . . . . . . . C.5.2 Gauging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C.5.3 Stamping Part Gauging . . . . . . . . . . . . . . . . . . . . . . . . . . . . C.5.4 Gauging of Radial Shaft Seals . . . . . . . . . . . . . . . . . . . . . . . C.6 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

297 297 298 300 304 304 304 305 305 306 307 308 309

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311

Preface

There are many books that address the theme of image processing and computer vision, so why should another book be written? A large proportion of these books are either theoretical textbooks or manuals for specic commercial software. A book that practically imparts theoretically founded contents has so far been missing. The following questions emerge in practice again and again and are as yet too vaguely or theoretically discussed: background and guidance in choosing modern hardware components, relaying the practical algorithmic fundamentals, ecient implementation of algorithms, interfacing existing libraries and implementing a graphical user interface. Furthermore, until now it has been hard to nd really complete solutions with open source code on topics such as object recognition, 3D acquisition, 3D tracking, bar code recognition or workpiece gauging. All these topics are now covered in this book and supplemented by many example calculations and example applications. To further facilitate access, along with the printed version the source code for the image processing library Integrating Vision Toolkit (IVT) is also available for download. The IVT, following modern paradigms, is implemented in C++ and compiles under all common operating systems. The individual routines can also be easily ported to embedded platforms. The source code of the applications in this book will be available by the time of publication. The download can be found through a link on the publishing houses website or on Professor Dillmanns IAIM department website. Wherever possible, we have considered references with good availability as well as the possibility of a free download. This book generally addresses itself to students of engineering sciences and computer science, to entrants, practitioners and anyone generally interested.

10

Preface

The theory section covers the image processing part of the lectures in Cognitive Systems and Robotics, as well as the appropriate course experiments in the practical robot course taught at the department of computer science at the University of Karlsruhe (TH). The addition of one of the established textbooks for the basics would complete the package. Acknowledgments The formation of this book is also due to many other people, who participated in the implementations or corrected the results. We want to thank our editor Raimund Krings for the great support, our translator Adam Lockett for the translation and Dr. Tamim Asfour, Andreas Bttinger, Tanja Geissler, Dilana o Hazer, Kurt Klier, Markus Osswald, Lars Ptzold, Ulla Scheich, Stefanie Speia del, Dr. Peter Steinhaus, Ferenc Toth and Kai Welke for implementations and for proofreading. We hope you enjoy this book and that it inspires you in the very interesting and promising eld of computer vision.

Karlsruhe, April 4, 2008

The Authors

Part I

Basics

1 Technical Fundamentals

Author: Tilo Gockel

1.1 Introduction
Many readers of this book already have a camera for private use and are surely familiar with handling certain peculiarities of photography. The results are often surprising regarding light distribution, perspective, depth of eld or color reproduction, compared with the scene that the photographer remembers. Why is that? The most developed human sensory organ is the eye. It exceeds every digital camera and chemical lm at resolution and dynamic range by some orders of magnitude. Even more important, however, is the direct connection between this sense and the processing organ, the brain. In order to make the eectiveness of this processing unit clear, an application is considered: the analysis of a scene regarding depth information. For this task, an industrial sensor would use a specic physical principle, be it triangulation, silhouette intersection or examination of the shadow cast. Regarding this approach, the human is far ahead, combining nearly all wellknown approaches: he unknowingly triangulates, he examines the shadows in the scene, the occlusions and information regarding sharpness, he uses color information and, above all, learnt model-knowledge to establish plausibility conditions. Thus humans know, for example, that a house is larger than a car and must be further away when casting an equally large image on the retina. This is only one example of many. Other examples would be the ability to adapt to dierent lighting conditions, the amazing eectiveness regarding segmenting relevant image details and many more. Unfortunately, with computer-aided image analysis these abilities are either only achievable in isolated and simplied forms or not at all. Here one makes do accordingly with the isolation of

14

1 Technical Fundamentals

relevant features, for example by the use of certain light sources and with the denition of certain scene characteristics (environmental lighting conditions, constant distance of the imaging sensor system from the object, telecentric optics etc.). In this narrow framework, machine vision is superior to the human visual sense: bar codes can be captured and evaluated in fractions of a second, stamping parts can be measured exactly in hundredths of millimeters, color information can be reproducibly compared, and microscopic and macroscopic scenes can be captured and evaluated. The larger part of this book discusses the algorithmic procedures and the associated implementations for this, but without a competent choice of the system components many problems of image processing are not only dicult to solve, but often completely unsolvable.

1.2 Light
In the process chain of image processing, illumination comes rst. We do not acquire the object with the camera, but its eect on the given illumination. With an unfavourable choice or arrangement of the light source, measuring tasks will often be unsolvable or demand a disproportionately large algorithmic eort in the image processing. This also applies to the reverse: with competent selection of the illumination, an image processing task may possibly be solved amazingly simply and also robustly.

1.2.1 Physical Fundamentals Across the range of the electromagnetic waves only the relatively narrow spectrum of visible light (380 to 780 nm), and the spectrum of current image sensors (approximately 350 to 1000 nm) is relevant for classical image processing. Correspondingly, deviating from the general radiation quantities, the so-called photometric quantities were introduced [Kuchling 89; Hornberg 06: Chapter 3]. The basis for these quantities is the spectral light sensitivity V () of the human eye as a function of the wavelength (Fig. 1.1). The maximum is at 0 = 555 nm and accordingly, V (0 ) = 1 is set here. The relationship between the physical quantity of the radiant ux e [Watt] and the photometric quantity of the luminous ux v [lumen] or [lm] is given by the equivalent photometric radiation K(). e is a measure of the absolute radiant power, and v a measure of the physiologically perceived radiant power. The maximum value Km of K (with 0 = 555 nm) is 683 lumens per Watt, for all other wavelengths the outcome is K() = Km V ().

1.2 Light
V()
1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 350 400 violet 450 blue

15

by night by day

Maximum value at 555 nm 500 550 600 green yellow 650 700 750 800 orange red infrared [nm]

Fig. 1.1. Light sensitivity function of the human eye over the wavelength.

For a monochromatic light source1 , and with the light sensitivity function V () according to Fig. 1.1, the luminous ux can be written as follows: v = e K() = e Km V () (1.1)

For a light source that delivers a broader spectrum, the integral over must be calculated. That is:
780 nm

v = Km
=380 nm

e () V ()d

(1.2)

The equations tempt to regard K() as the eciency of a light source and actually, the following relation v is also called luminous eciency (also, luminous ecacy): v K() = = v (1.3) e Here, however, it is wrongly assumed that PTotal e , thus that the entire absorbed energy is converted into radiation energy. Furthermore, the eciency is commonly written as dimensionless number 0.01.0 or 0%100%. K() or alternatively v , however, has the unit [lm/W].

A light source that emits a single wavelength, such as a laser light source.

16

1 Technical Fundamentals

For a more precise formulation, the overall luminous ecacy o is introduced: o = v PTotal (1.4)

The value o still has the same unit [lm/W] as the value v . Now, however, the dimensionless coecient can be written with reference to the maximally attainable o,max as: = o o,max [0%100%] (1.5)

With these considerations it becomes clear that statements about the eciency of a light source or also comparisons of dierent light sources are only conditionally possible and should be handled with care. With the introduction of the solid angle (see Fig. 1.2, unit [steradian or sr]), a connection between the luminous ux v [lumen or lm] and the luminous intensity I [candela or cd] in relation to can be stated: I[cd] = v [lm] [sr] (1.6)

Light source Solid angle r

Fig. 1.2. Illustration of the solid angle .

Given a surface segment of a sphere with radius r in accordance with Fig. 1.2, the solid angle can be written as: [sr] = A[m2 ] r2 [m2 ] (1.7)

For an evenly radiating light source, = surface of a sphere/r2 = 4r2 /r2 = 4 results accordingly (the size is dimensionless, however, similar to [ ] the unit [sr] is commonly used).

1.2 Light

17

From Eq. (1.6) two further values can be derived with reference to a radiating or illuminated planar surface. These are the light density L [cd/m2 ] and the illumination level Ev [lux or lx]: L[cd/m2 ] = Ev [lx] = I[cd] Aradiating [m2 ] v [lm] Ailluminated [m2 ] (1.8) (1.9)

The light density L is a measure of the perceived brightness. A light source appears all the brighter, the smaller the surface is in comparison to the luminous intensity.2 With the value Ev now also the so-called light exposure H [lux second] can be written as the product of illumination level Ev and time: H[lx s] = Ev [lx] t[s] Here is a brief summary of the most important basic rules: The entire visible radiation of a light source is described with the value v (luminous ux, [lumen]). The light radiation relating to a solid angle is described with the value I (luminous intensity, [candela]). The light radiation relating to a receiving surface is specied with the value Ev (illumination level, [lux]). The specications of illuminants (luminous ux, luminous intensity, . . . ) apply to the perception of the human eye. Accordingly, in image processing the spectral sensitivity function V () of the sensor, deviating from the function of the eye must be considered and compared to the spectral distribution of the illuminant. Finally, the recently introduced unit ANSI lumen is to be mentioned. It refers to the measurement of the radiant ux for the evaluation of projectors or other illumination equipment and the distribution of the luminous ux over the lit surface (the so-called Nine Point Measurement). With modern commercial projectors however, the distribution is so even that it can be calculated approximately with v v,ANSI . (1.10)

Here it is assumed that the irradiation takes place perpendicularly. If this is not the case, it must be calculated vectorially. The angle is then incorporated into the equation.

18

1 Technical Fundamentals

1.2.2 Illuminati A list of dierent light sources is shown in Table 1.1. It is noticeable that light emitting diodes (LED) have many advantages. Since LEDs have actually replaced many other light sources, this technology will be explained in detail in this section (see also [Hornberg 06: Chapter 3; TIS 07: white papers]). There are other positive characteristics of the LED technology in addition to the advantages shown in this table: LEDs have a good long-term consistency regarding light output and spectral light distribution. Their smallness makes the grouping of several LEDs to modular designs or the use of special illumination designs possible. As LEDs operate with a regulated current source, they do not require a high-voltage ignition electronics, unlike HSI lamps, for example. A comparatively simple power supply unit is sucient. In the simplest case, this is a series resistor. LEDs only require small supply voltages of approximately 1.5 VDC 3.5 VDC. Illumination modules from several LEDs are thus easily designed for low voltages of 5, 12 or 24 VDC or AC. Monochromatic LEDs produce an approximately monochromatic light. Thus, the chromatic aberration of the optic does no longer have any effect. For a short time, LEDs can be operated with a much higher current, than the indicated maximal current. If they are operated in pulse mode and synchronized with the camera, the luminous intensity increases (for more details see [Hornberg 06: Chapter 3]). LEDs are also available in the ultraviolet and infrared range. From the data sheet of a modern light emitting diode, the meaning of the parameters becomes clear (Table 1.2). Besides the advantages mentioned, LEDs also have some downsides: The luminous intensity of available LEDs is still not as high as those from traditional illuminants such as halogen lamps or gas discharge lamps (for comparison: a standard data projector with HTI lamp: approximately 2 000 ANSI lumens, LED projector: 28 ANSI lumens this is not a misprint). When trying to emulate a very bright light source by the combination of multiple LEDs, the technical designer often fails because of the missing convection; with a too high component density the lost heat can no longer be dissipated. Also, the advantages of a very small illuminant with regard to an optimal design of the optics is then lost.

Characteristics Size Cost, relating to v Maximum luminous flux Suitability Suitability Suitability for diffuse for directed for usage illumination illumination with lenses Usage as strobe, with sync Aging effects Operating hours approx..

Efficacy , approx.

Comments

Light source

Spiral-wound filament ++ 1.92.6 % 0

1 000

Halogen + 0 + 2.35.1 % +

3 000

Gas discharge (HTI, HSI ...) 1527 % ++ 0 + +

6 000

Neon, luminescent material + 6.615 % + ++

7 500

Great heat generation, therefore frequently used with fiber optics. Great heat generation, therefore frequently used with fiber optics. Almost solely used with high frequency PSU.

Light emitting diode 0 520 % 0 ++

++

++

++

++

50 000

Different colors, also available as IR and UV, small size ( array configurations). Usage as structured light. ++ ++ ++ 0 10 000

Laser diode 712 %

1.2 Light

Table 1.1. A comparison of dierent light sources. Eciency in accordance with Eq. (1.5), data source for : [Wikipedia 07, Luminous Ecacy].
++ n/a Weather? Daytime? ++ 0 with lens + with mechanical shutter n/a ++

Daylight

++

19

Excerpt chapter truncated . . .

2 Introduction to the Algorithmics

Author: Pedram Azad

2.1 Introduction
After having explained fundamentals of image acquisition from a technical perspective in the previous chapter, now it will be explained how images can be processed with a computer. First of all, the mathematical model for the mapping of an observed scene onto the image sensor is introduced. Subsequently, conventional encodings for the representation of images are explained, in order to present a selection of image processing algorithms based upon these. The models and methods from this chapter serve as basis for the understanding of the implementation details and the numerous applications in the following chapters of this book.

2.2 Camera Model and Camera Calibration


If metric measurements are to be accomplished with the aid of images, in two dimensions (2D) as well as in three dimensions (3D), then the understanding and the modeling of the mapping of a scene point on the image sensor are necessary. In contrast, if the information of interest is coded exclusively in the image, like in the case of the recognition of bar codes, the understanding of the representation of images in memory is sucient. In any case it can only be advantageous not to regard the procedure of the mapping of a scene on the image sensor from a mathematical view as black box.

68

2 Introduction to the Algorithmics

2.2.1 Pinhole Camera Model The central perspective model lies at the heart of almost all mathematical models of the camera mapping function. As a basis for the understanding of the mathematical relationships, the wide-spread pinhole camera model serves. It is assumed that all points of a scene are projected to the image plane B via a straight ray through an innitesimally small point: the projection center Z. With conventional optics the projection center is located in front of the image plane i.e. between the scene and the image plane. For this reason, the recorded image is always a horizontally and vertically mirrored image of the recorded scene (see Fig. 2.1).

Object plane

Image plane (b)

Projection center

Image plane (a)

Fig. 2.1. Classic pinhole camera model (a), Pinhole camera model in positive position (b).

This circumstance has, however, no serious eects; the image is simply turned 180 degrees. With a digital camera, the correct image is transmitted by transferring the pixels in the opposite order from the chip. In order to computationally model the pinhole camera model, the second theorem on intersecting lines is used, leading to: f x u = (2.1) v z y where u, v denote the image coordinates and x, y, z the coordinates of a scene point in the 3D coordinate system whose origin is the projection center. The

2.2 Camera Model and Camera Calibration

69

parameter f is known as the camera constant; it denotes the distance from the projection center to the image plane. In practice, usually the projection center is assumed to be lying behind the image plane as in Eq. (2.1) whereby just the sign of the image coordinates u, v is changed. In this way, the camera image is modeled as a central perspective projection without mirroring. This is referred to as the pinhole camera model in positive position.

x y z f v u Object plane Image plane Z, Projection center


Fig. 2.2. The central perspective in a pinhole camera model in positive position.

2.2.2 Extended Camera Model The pinhole camera model describes the mathematical relationships of the central perspective projection to a sucient measure. It is, however, missing some enhancements for practical application, which are presented in the following. Firstly, some terms must be introduced and coordinate systems dened. Principal axis: The principal axis is the straight line that runs perpendicularly to the image plane and through the projection center. Principal point: The principal point is the intersection of the principal axis with the image plane, specied in image coordinates.

70

2 Introduction to the Algorithmics

Image coordinate system: The image coordinate system is a two-dimensional coordinate system. Its origin lies in the top left-hand corner of the image, the u-axis points to the right, the v-axis downward. The units are in pixels. Camera coordinate system: The camera coordinate system is a three-dimensional coordinate system. Its origin lies in the projection center Z, the xand y-axes run parallel to the u- and v-axes of the image coordinate system. The z-axis points forward i.e. toward the scene. The units are in millimeters. World coordinate system: The world coordinate system is a three-dimensional coordinate system. It is the basis coordinate system, and can lie anywhere in the area arbitrarily. The units are in millimeters. There is no uniform standard concerning the directions of the image coordinate systems axes and therefore also concerning the camera coordinate systems x- and y-axes. While most camera drivers presuppose the image coordinate system as dened in this book, for example with bitmaps the origin is located in the bottom left-hand corner of the image and the v-axis points upward. In order to avoid the arising incompatibilities, the image processing library IVT, which underlies this book, converts the images of all image sources in such a manner that the previously dened image coordinate system is valid. The parameters that fully describe a camera model are called camera parameters. One distinguishes intrinsic and extrinsic camera parameters. The intrinsic camera parameters are independent from the choice of the world coordinate system, and therefore remain constant if the hardware setup changes. The extrinsic camera parameters, however, model the transformation from the world coordinate system to the camera coordinate system, and must be redetermined if the camera pose changes. Up to now, the only (intrinsic) camera parameter in the pinhole camera model has been the camera constant f . A world coordinate system has not yet been considered and therefore neither have extrinsic camera parameters. So far it has also been assumed that the principal point lies at the origin of the image coordinate system, the pixels are exactly square pixels, and an ideal lens has been assumed that reproduces the scene distortion-free. The more realistic camera model dened in the following recties these disadvantages. First of all, the pixels are assumed not to be square, but rectangular. Since in the camera constant the conversion factor from [mm] to [pixels] is contained, the dierent height and width of a pixel can be modeled by dening the camera constant f independently for the u- and v-direction. The denotations used in the following are fx and fy , usually referred to as the focal length, the units are in pixels. With the inclusion of the principal point C(cx , cy ), the new mapping from camera coordinates xc , yc , zc to image coordinates u, v reads: u v = cx cy + 1 zc fx xc fy yc (2.2)

2.2 Camera Model and Camera Calibration

71

Commonly, this mapping is also formulated as a matrix multiplication with the calibration matrix fx 0 cx K = 0 fy cy 0 0 1 using homogeneous coordinates (see Appendix B.1.7): u zc xc v zc = K y c zc zc

(2.3)

The inverse of this mapping is ambiguous; the possible points (xc , yc , zc ) that are mapped to the pixel (u, v) lie on a straight line through the projection center. It can be formulated through the inverse calibration matrix 1 cx fx 0 fx c K 1 = 0 1 y
fy fy

0 0 and the equation:

xc u zc yc = K 1 v zc zc zc

(2.4)

Here the depth zc is the unknown variable; for each zc the coordinates xc , yc , dened in the camera coordinate system, of the point (xc , yc , zc ) are calculated which maps to the pixel (u, v). In line with the notation from Eq. (2.2), the mapping dened by Eq. (2.4) can analogously be formulated as follows: uc x xc fx vcy y c = zc (2.5) fy zc 1 Arbitrary twists and shifts between the camera coordinate system and the world coordinate system are modeled by the extrinsic camera parameters. They dene a coordinate transformation from the world coordinate system to the camera coordinate system, consisting of a rotation R and a translation t: xc = R xw + t (2.6)

where xw := (x, y, z) denes the world coordinates and xc := (xc , yc , zc ) the camera coordinates of the same 3D point. The complete mapping from the world coordinate system to the image coordinate system can nally be described in closed-form by the projection matrix P = K(R | t)

72

2 Introduction to the Algorithmics

using homogeneous coordinates: x us y v s = P z s 1 If the mapping from Eq. (2.6) is inverted, then: xw = R T xc R T t (2.8)

(2.7)

where RT denes the transposed matrix of R, where for rotations it applies R1 = RT . Thus the inverse of the complete mapping from Eq. (2.7), which is ambiguous like the inverse mapping from Eq. (2.4), can be formulated by: us x y = P 1 v s (2.9) s z 1 with the inverse projection matrix P 1 = RT (K 1 | t)

2.2.3 Camera Calibration The calibration of a camera means the determination of both the intrinsic parameters cx , cy , fx , fy and the extrinsic parameters R, t. Beyond that, parameters which model nonlinear distortions of the lens such as, for example, radial or tangential lens distortions, also belong to the intrinsic parameters. The modeling of such lens distortions is dealt with in Section 2.2.4; in this section, however, a purely linear camera mapping is rst calculated, i.e. without distortion parameters. The starting point for the test eld calibration is a set of point pairs {pw,i , Pb,i } with i {1, . . . , n}, where Pw,i R3 describes points in the world coordinate system and Pb,i R2 their projection into the image coordinate system. On the basis of n 6 point pairs, which span a non-planar area, it is possible to compute the camera parameters with the Direct Linear Transformation (DLT) [AbdelAziz 71]. In practice, however, a lot more point pairs are used, in order to achieve a more accurate result. For this purpose, a dot pattern or a checkerboard pattern (see Fig. 2.3) is usually recorded in several positions, whereby the dimensions of the pattern are accurately known. The diculty is to know the relative position of the individual presentations of the pattern to each other. A possible solution to this problem is the use of rectangular glass plates with a known thickness in combination with a perpendicular stop (see

2.2 Camera Model and Camera Calibration

73

Fig. 2.3. Examples of calibration patterns.

Fig. 2.4. Example of a three-dimensional calibration object.

Fig. 2.5). A further possibility is the use of a three-dimensional calibration object (see Fig. 2.4). However, with such a calibration object, it is hardly possible to obtain a comparably large number of points, which can be measured in the camera image and matched. In [Zhang 99], a calibration method is presented which computes the relative motion between arbitrary presentations of the calibration pattern on the basis of point correspondences, and thus makes the use of a complex hardware setup unnecessary. This method is implemented in the OpenCV [OpenCV 08] and is also used in the IVT for camera calibration. Let now n 6 point pairs {Pw,i (xi , yi , zi ), Pb,i (ui , vi )}, i {1, . . . , n} be given. On the basis of Eq. (2.7), we want for each point pair Pw (x, y, z), Pb (u, v) to apply:

74

2 Introduction to the Algorithmics

Fig. 2.5. Use of glass plates at a perpendicular stop for the camera calibration of a laser scanner (see [Azad 03]).

x L1 L2 L3 L4 us v s = L5 L6 L7 L8 y z s L9 L10 L11 L12 1 This equation can be reformulated by division to: u= L1 x + L2 y + L3 z + L4 L9 x + L10 y + L11 z + L12 L5 x + L6 y + L7 z + L8 v= L9 x + L10 y + L11 z + L12

(2.10)

(2.11)

Since homogeneous coordinates are used, each real-valued multiple r P denes the same projection, which is why w.l.o.g.1 L12 = 1 can be set. Multiplication with the denominator and conversion nally leads to the two following equations: u = L1 x + L2 y + L3 z + L4 L9 ux L10 uy L11 uz v = L5 x + L6 y + L7 z + L8 L9 vx L10 vy L11 vz

(2.12)

Without loss of generality.

2.2 Camera Model and Camera Calibration

75

With the aid of Eq. (2.12) and by using all n point pairs, now the following over-determined linear system of equations can be set up: L1 L2 L3 u1 x1 y1 z1 1 0 0 0 0 u1 x1 u1 y1 u1 z1 L4 0 0 0 0 x1 y1 z1 1 v1 x1 v1 y1 v1 z1 L5 v1 . . . . . . . . . . . L = . (2.13) . . . 6 . . . . . . . . . . . . . . . . . . . . . xn yn zn 1 0 0 0 0 un xn un yn un zn L7 un vn 0 0 0 0 xn yn zn 1 vn xn vn yn vn zn L8 L9 L10 L11 or short Ax=b As is shown in Appendix B, the optimal solution x of this over-determined system of linear equations in the sense of the Euclidean norm can be determined with the method of least squares. For this purpose, the following equation, which results from left-sided multiplication of AT , must be solved: AT A x = AT b (2.14)

One possibility for solving this system of linear equations is the use of the Moore Penrose pseudoinverse (AT A)1 . This can be calculated, for example, using the Cholesky decomposition (see Appendix B.2.3), since AT A is a symmetrical matrix. The solution x is then calculated by: x = (AT A)1 AT b (2.15)

If the DLT parameters L1 . . . L11 are determined, then the appropriate pixel Pb (u, v) can be calculated with the assistance of the Eqs. (2.11) for any world point Pw (x, y, z). Conversely, for any pixel Pb , the set of world points Pw that map to this pixel can be calculated by solving the following under-determined system of equations: x L9 u L1 L10 u L2 L11 u L3 L4 u y = (2.16) L9 v L5 L10 v L6 L11 v L7 L8 v z

76

2 Introduction to the Algorithmics

The solution of this system of equations is the straight line g of all possible world points Pw , and can be calculated by the following steps: a := L9 u L1 b := L10 u L2 c := L11 u L3 d := L9 v L5 e := L10 v L6 f := L11 v L7 g := L4 u h := L8 v

(2.17)

Using the denitions from Eq. (2.17), it now follows from the under-determined system of linear equations from Eq. (2.16) by elimination of x: (bd ae)y + (cd af )z = dg ah With the following denition: r := bd ae s := cd af t := dg ah the parameter notation of the straight line g nally reads: grbt bscr x ar ar y = t + u s r r z 1 0 (2.18)

(2.19)

(2.20)

As was shown with the Eqs. (2.12) and (2.20), with the direct assistance of the DLT parameters L1 . . . L11 , the camera mapping functions from 3D to 2D and in reverse can be calculated. In some applications it can be moreover of interest to know the intrinsic and extrinsic parameters of the camera explicitly. In particular, the knowledge of the intrinsic parameters is necessary for the modeling and the compensation of lens distortions (see Section 2.2.4). The intrinsic parameters cx , cy , fx , fy and the extrinsic parameters R, t can be determined on the basis of the DLT parameters L1 . . . L11 with the aid of the following calculations.

2.2 Camera Model and Camera Calibration

77

The calculations are essentially taken from [Mor 02], although a few modie cations had to be made in order to be consistent with the introduced camera model. L := L2 + L2 + L2 9 10 11

L1 L9 + L2 L10 + L3 L11 L2 L5 L9 + L6 L10 + L7 L11 cy = L2 2 + L2 + L2 L1 2 3 fx = c2 x L2 2 + L2 + L2 L5 6 7 fy = c2 y L2 cx = r31 = r32 = r33 = r11 = r12 = r13 = r21 = r22 = r23 = L9 L L10 L L11 L L1 L cx r31 fx L2 cx r32 L fx L3 L cx r33 fx L5 L cy r31 fy
L6 L L7 L

(2.21)

cy r32 fy cy r33 fy

r11 r12 r13 R = r21 r22 r23 r31 r32 r33 1 L1 L2 L3 L4 t = R L5 L6 L7 L8 L9 L10 L11 1

(2.22)

Excerpt chapter truncated . . .

6 Workpiece Gauging

Author: Tilo Gockel

6.1 Introduction
A typical application of industrial image processing is workpiece gauging. In Appendix C, a concrete example with a commercial image processing tool will be presented. In this chapter some basics are initially discussed. Frequently the gauging procedure is used in combination with a transmitted light illumination, and an image situation arises as in Fig. C.3 and C.5. Before beginning the actual measurement, a workpiece for which the relevant dimensions are correct (a so-called golden template) is placed under the camera, and the system is thereby calibrated. The calibration takes place in such a manner that a known dimension is measured and the result of the measurement in the unit [pixel] together with the target result in [mm] or [inch] is stored. Typically the scaling factors for u and v are determined in two measurements. Then, on the basis of the template part, position and nominal value for one or more relevant dimensions are determined and stored. The gauging of parts from the production line then takes place. By means of a feed, the workpiece comes under the camera. It is usually only ensured that the parts lie at, the precise position and orientation is not known.1 Accordingly, the rotated position of the object must be determined by means of an alignment before the actual measurement can start. The alignment can take place as in Appendix C, on the basis of known object features (in this example: center
1

Here and in the following example it is assumed: Optics and structure as in the case studies from Appendix C.

190

6 Workpiece Gauging

of area of the circular cut-outs), but a more general calculation based on the moments of area of higher order can also be used [Burger 06: Chapter 11.4; Kilian 01], as is shown in the following sections. After the determination of the rotated position of the object, relevant grayscale transitions on the outline, i. e. edges, can be found and measured.

6.2 Algorithmics
6.2.1 Moments In this implementation, alignment takes place via calculating the area moments of the object region. The moments of a region R in a grayscale image are dened as follows: mpq = I(u, v) up v q (6.1)
(u,v)R

For a binary image of the form I(u, v) {0, 1} the equation is reduced to: mpq =
(u,v)R

up v q

(6.2)

For calculation, see also Algorithm 39 and [Burger 06]. Algorithm 39 CalculateMoments(I, p, q) mpq
mpq := 0 for all pixels (u, v) in I do if I(u, v) = 0 then mpq := mpq + up v q end if end for

The meaning of the zeroth and rst order moments is particularly graphic. From them, the surface area and the center of area of a binary region can be determined as follows: A(R) =
(u,v)R

1 =
(u,v)R

u0 v 0 = m00

(6.3)

6.2 Algorithmics

191

u=

1 A(R) 1 A(R)

u1 v 0 =
(u,v)R

m10 m00 m01 m00

(6.4)

v=

u0 v 1 =
(u,v)R

(6.5)

Herein A(R) is the surface area and u, v are the coordinates of the center of area of the binary region R. With the introduction of these coordinates of the region, it is now also possible to formulate the central moments. For this, the origin of the coordinate system is shifted into the center of area, yielding: pq (R) =
(u,v)R

I(u, v) (u u)p (v v )q

(6.6)

or for the special case of a binary region: pq (R) =


(u,v)R

(u u)p (v v )q

(6.7)

For the rotational alignment in this application, further calculation of the regions orientation is necessary. The angle between the major axis or principal axis2 and the u axis is given by: (R) = 1 arctan 2 2 11 (R) 20 (R) 02 (R) (6.8)

The major axis has a direct physical basis just like the center of area: It is the axis of rotation through the center of area, with which, when rotated, the smallest moment of inertia arises. It should be noted that, with this equation, the orientation of the major axis can only be determined within [0, 180o ). Section 6.3 shows an approach to determine the angle within [0, 360o ). With these moments a rotational alignment can now be performed for the gauging application in this chapter. For a further calculation regarding normalized central moments and invariant moments (for example, the so-called Hus moments), see [Burger 06: Chapter 11.4] and [Kilian 01]. Also, the moments for a binary region can be calculated much more eciently regarding only the contour of the region. For this approach, see [Jiang 91] and [OpenCV 07: Function cvContoursMoments()].

The associated axis is the minor axis. It lies orthogonal to the major axis and goes through the center of area.

192

6 Workpiece Gauging

6.2.2 Gauging The subpixel-precise determination of the grayscale transitions, i. e. edges, would have gone beyond the scope of the implementation in Section 6.4, but the calculation for this is relatively straightforward and will briey be introduced for the readers own implementations: After alignment, a vertical edge of known height is to be measured. For this, the image row I(u, v = vc = const) is convolved with a one-dimensional edge lter, for example in the form of (1, 0, 1). After comparison of the result with a given threshold value, the transition u0 is known with pixel-accuracy.3 A subpixel-precise determination can now take place via a parabolic t with the inclusion of two surrounding grayscale values on the line of the gradients. Given I(u1 , vc ) = i1 I(u0 , vc ) = i0 I(u+1 , vc ) = i+1 the subpixel-precise position of the grayscale transition in line vc can be calculated by: i1 i+1 (6.9) uSubpixel = u0 + 2(i1 2i0 + i+1 ) The calculation method results from the presumption of a Gauss-shaped distribution of the grayscale values in the gradient line. Furthermore, it is assumed that the Gauss function near the maximum can be approximated by a parabola (for the derivation of this relation, see for example [Gockel 06: Chapter 3.2]). A prior smoothing of the grayscale values, for example with a Gauss lter, is helpful. For further details see also [Hornberg 06: Chapter 8.7; Bailey 03; Bailey 05].

6.3 Implementation
In the program all available jpg les are loaded successively from the current directory. For each le the following command steps are processed: 1. Conversion to grayscale, inversion4 , binarization (here: using a xed threshold value of 128). 2. Moment calculation by means of the function cvMoments() from the OpenCV library.
3

The threshold value can be calculated for example on the basis of quantiles, see Section 2.5.3. The OpenCV function expects a white object on black background.

6.3 Implementation

193

3. Calculation of the center of area of the region and determination of the angle of the major axis. 4. Determination of the orientation of the major axis using a function which counts the pixels belonging to the region along a straight line (this is a modied version of the function PrimitivesDrawer::DrawLine() from the IVT, here: WalkTheLine()). The line runs four times, in each case starting from the center of area, along the major and minor axis, and thus spans an orthogonal cross with the crossing point in the center of area (see visualization). With the four results, the ambiguity of is resolved (compare the source code in Section 6.4). 5. Drawing the now determined coordinate system in the color image. 6. Rotating the binary image by (function ImageProcessor::Rotate(), see also Section 2.9). 7. Gauging of an (arbitrarily) specied dimension near the center of area, parallel to the minor axis. For this the function WalkTheLine() is used again. 8. Output of the image data and of the measured dimension in two windows (see Fig. 6.1).

Fig. 6.1. Screenshots from the implementation. Above: Stamping part with indicated object coordinate system. Below: Part after rotational alignment. Also indicated is the cross used for the determination of and the measured cross-section line.

Notes: In the sample application, the scaling factors were not explicitly determined, as it is the same procedure as the gauging of the workpiece. Moreover, to calculate the rotational alignment, in professional applications not the rota-

194

6 Workpiece Gauging

tion of the entire image is implemented, but, more eciently, only the rotation of the small probe area (see Appendix C). As shown, the algorithms for moment calculation are also applicable to grayscale images, but the mentioned contour-based calculation can naturally only be applied to binary images. Finally, it should be noted that in the literature, resolving the ambiguity of the orientation of the principal axis is recommended via computation of the centralized moments of higher order (typically: observation of the change of sign from 30 , see for example [Palaniappan 00]). From our experience, however, no robust decision is possible using this approach.

6.4 References and Source Code

195

6.4 References and Source Code


[Bailey 03] D.G. Bailey, Sub-pixel estimation of local extrema, in: Proc. of Image and Vision Computing, pp. 414419, Palmerston North, New Zealand, 2003. Available online: http://sprg.massey.ac.nz/publications.html [Bailey 05] D.G. Bailey, Sub-pixel Proling, in: 5th Int. Conf. on Information, Communications and Signal Processing, Bangkok, Thailand, pp. 13111315, December 2005. Available online: http://sprg.massey.ac.nz/publications.html [Burger 06] W. Burger, M.J. Burge, Digitale Bildverarbeitung, SpringerVerlag, Heidelberg, 2006. [Gockel 06] T. Gockel, Interaktive 3D-Modellerfassung, Dissertation, Universitt Karlsruhe (TH), FB Informatik, Lehrstuhl IAIM Prof. R. Dillmann, a 2006. Available online: http://opus.ubka.uni-karlsruhe.de/univerlag/volltexte/2006/153/ [Hornberg 06] A. Hornberg (Hrsg.), Handbook of Machine Vision, WileyVCH-Verlag, Weinheim, 2006. [Jiang 91] X.Y. Jiang, H. Bunke, Simple and fast computation of moments, in Journal: Pattern Recognition Archive, Volume 24, Issue 8, pp. 801806, August 1991. Available online: http://cvpr.uni-muenster.de/research/publications.html [Kilian 01] J. Kilian, Simple Image Analysis by Moments Version 0.2, OpenCV Library Documentation. Technical Paper (free distribution), online published, 2001. Available online: http://serdis.dis.ulpgc.es/~itis-fia/FIA/doc/Moments/OpenCv/ [OpenCV 07] Open Computer Vision Library. Open software library for computer vision routines. Formerly company Intel. http://sourceforge.net/projects/opencvlibrary [Palaniappan 00] Palaniappan, Raveendran, Omatu, New Invariant Moments for Non-uniformly scaled Images, Pattern Analysis and Applications, Springer, 2000.

196

Montag Februar 26, 2007 23:01


// (but here also used for linedrawing) int WalkTheLine(CByteImage *pImage, const Vec2d &p1, const Vec2d &p2, int r, int g, int b) { int pixelcount = 0; const double dx = p1.x p2.x; const double dy = p1.y p2.y;

main.cpp
Seite 1/6 Montag Februar 26, 2007 23:01 Seite 2/6

main.cpp

***************************************************************************

Project: Copyright: Date: Filename: Author:

Alignment and Gauging for industrial parts. Tilo Gockel (Author) February 25th 2007 main.cpp Tilo Gockel, Chair Prof. Dillmann (IAIM), Institute for Computer Science and Engineering (ITEC/CSE), University of Karlsruhe. All rights reserved. if (fabs(dy) < fabs(dx)) { const double slope = dy / dx; const int max_x = int(p2.x + 0.5); double y = p1.y + 0.5;

***************************************************************************

6 Workpiece Gauging

Description: Program searches *.jpgFiles in the current directory. Then: calculation of center of gravity and principal axis for alignment, Then gauging (measurement) of a given distance,

Algorithms: Spatial moments, central moments, calculation of direction of major axis, gauging (counting pixels to next b/w change), in [Pixels].

// // // // // // // // // // // // // // // // // // // // // // // // // // // // // if (p1.x < p2.x) { for (int x = int(p1.x + 0.5); x <= max_x; x++, y += slope) { if (pImage>pixels[int(y) * pImage>width + x] != 0) pixelcount++; PrimitivesDrawer::DrawPoint(pImage, x, int(y), r, g, b); } } else { for (int x = int(p1.x + 0.5); x >= max_x; x, y = slope) { if (pImage>pixels[int(y) * pImage>width + x] != 0) pixelcount++; PrimitivesDrawer::DrawPoint(pImage, x, int(y), r, g, b); } }

Comments: OS: Windows 2000 or XP; Compiler: MS Visual C++ 6.0, Libs used: IVT, QT, OpenCV.

***************************************************************************

#include #include #include #include #include #include #include #include #include #include #include

"Image/ByteImage.h" "Image/ImageAccessCV.h" "Image/ImageProcessor.h" "Image/ImageProcessorCV.h" "Image/PrimitivesDrawer.h" "Image/PrimitivesDrawerCV.h" "Image/IplImageAdaptor.h" "Math/Constants.h" "Helpers/helpers.h" "gui/QTWindow.h" "gui/QTApplicationHandler.h"

} else { const double slope = dx / dy; const int step = (p1.y < p2.y) ? 1 : 1; const int max_y = int(p2.y + 0.5); double x = p1.x + 0.5;

#include <cv.h>

#include <qstring.h> #include <qstringlist.h> #include <qdir.h>

#include #include #include #include #include

<iostream> <iomanip> <windows.h> <string.h> <math.h>

using namespace std; }

} return pixelcount;

if (p1.y < p2.y) { for (int y = int(p1.y + 0.5); y <= max_y; y++, x += slope) { if (pImage>pixels[y * pImage>width + int(x)] != 0) pixelcount++; PrimitivesDrawer::DrawPoint(pImage, int(x), y, r, g, b); } } else { for (int y = int(p1.y + 0.5); y >= max_y; y, x = slope) { if (pImage>pixels[int(y) * pImage>width + int(x)] != 0) pixelcount++; PrimitivesDrawer::DrawPoint(pImage, int(x), y, r, g, b); } }

// modified version of DrawLine(): returns sum of visited nonblack pixels

Montag Februar 26, 2007 23:01


v.x = cos(theta + PI/2) * 230 + center.x; v.y = sin(theta + PI/2) * 230 + center.y; int count3 = WalkTheLine(pImage, center, v, 128, 0, 0); v.x = cos(theta PI/2) * 230 + center.x; v.y = sin(theta PI/2) * 230 + center.y; int count4 = WalkTheLine(pImage, center, v, 64, 0, 0); if ((count1 > count2) && (count3 < count4)) theta = theta + PI; // // // // Optional / for debugging: Console output cout << "Area: " << m00 << endl; cout << "Center (x,y): " << center.x << " " << center.y << endl; cout << "Theta [DEG]: " << ((theta * 180.0) / PI) << endl << endl;

main.cpp
Seite 3/6 Montag Februar 26, 2007 23:01 Seite 4/6

main.cpp

void MomentCalculations(CByteImage *pImage, Vec2d &center, PointPair2d &orientation, double &theta) { // calculate moments IplImage *pIplInputImage = IplImageAdaptor::Adapt(pImage); CvMoments moments; cvMoments(pIplInputImage, &moments, 1); //1: treat grayvalues != 0 as 1 cvReleaseImageHeader(&pIplInputImage);

// for center of const double m00 const double m01 const double m10 }

gravity = cvGetSpatialMoment(&moments, 0, 0); = cvGetSpatialMoment(&moments, 0, 1); = cvGetSpatialMoment(&moments, 1, 0);

// for angle const double const double const double int main(int argc, char *argv[]) { double theta = 0.0;

of major axis u11 = cvGetCentralMoment(&moments, 1, 1); u20 = cvGetCentralMoment(&moments, 2, 0); u02 = cvGetCentralMoment(&moments, 0, 2);

theta = 0.0;

// // // //

now: case differentiation: cmp.: [Johannes Kilian 01], Simple Image Analysis by Moments] online: http://serdis.dis.ulpgc.es/~itisfia/FIA/doc/Moments/OpenCv/ but: STILL AMBIGUOUS in n * 180 Degrees !

QString path = QDir::currentDirPath(); QDir dir(path); QStringList files = dir.entryList("*.jpg", QDir::Files); if (files.empty()) { cout << "Error: could not find any *.jpg Files" << endl; return 1; } QStringList::Iterator it = files.begin(); QString buf = QFileInfo(path, *it).baseName(); buf += ".jpg"; CQTApplicationHandler qtApplicationHandler(argc, argv); qtApplicationHandler.Reset(); // width, height must be multiples of 4 (!) CByteImage colorimage; if (!ImageAccessCV::LoadFromFile(&colorimage, buf.ascii())) { printf("error: could not open input image file\n"); return 1; } CByteImage grayimage(colorimage.width, colorimage.height, CByteImage::eGrayScale); CByteImage binaryimage(colorimage.width, colorimage.height, CByteImage::eGrayScale); ImageProcessor::ConvertImage(&colorimage, &grayimage); // calculations in grayimage and binaryimage // drawings and writings in colorimage for display CQTWindow imgwindow1(colorimage.width, colorimage.height); imgwindow1.DrawImage(&colorimage);

if ( ((u20 u02) == 0) && (u11 == 0) ) // 1 theta = 0.0; if ( ((u20 u02) == 0) && (u11 > 0) ) // 2 theta = PI / 4.0; if ( ((u20 u02) == 0) && (u11 < 0) ) // 3 theta = (PI / 4.0); if ( ((u20 u02) > 0) && (u11 == 0) ) // 4 theta = 0.0; if ( ((u20 u02) < 0) && (u11 == 0) ) // 5 theta = (PI /2); if ( ((u20 u02) > 0) && (u11 > 0) ) // 6 theta = 0.5 * atan(2 * u11 / (u20 u02)); if ( ((u20 u02) > 0) && (u11 < 0) ) // 7 theta = 0.5 * atan(2 * u11 / (u20 u02)); if ( ((u20 u02) < 0) && (u11 > 0) ) // 8 theta = (0.5 * atan(2 * u11 / (u20 u02))) + PI / 2; if ( ((u20 u02) < 0) && (u11 < 0) ) // 9 theta = (0.5 * atan(2 * u11 / (u20 u02))) PI / 2;

Math2d::SetVec(center, m10 / m00, m01 / m00);

// now: determine direction of major axis // go crosslike, start from COG, go to borders // count pixels... (cmp. visualization)

6.4 References and Source Code

Vec2d v;

v.x = cos(theta) * 250 + center.x; v.y = sin(theta) * 250 + center.y; int count1 = WalkTheLine(pImage, center, v, 255, 0, 0);

v.x = cos(theta + PI) * 230 + center.x; v.y = sin(theta + PI) * 230 + center.y; int count2 = WalkTheLine(pImage, center, v, 255, 255, 0);

197

198

Montag Februar 26, 2007 23:01


char text[512]; sprintf(text, "Cross section in pixels: %d", i); PrimitivesDrawerCV::PutText(&colorimage, text, 20, 60, 0.8, 0.8, 255, 0, 100, 1); imgwindow1.DrawImage(&colorimage); imgwindow2.DrawImage(&binaryimage); //Sleep(1200); // oops, too fast to see anything.... ++it; if (it == files.end()) it = files.begin(); // until hell freezes over

main.cpp
Seite 5/6 Montag Februar 26, 2007 23:01 Seite 6/6

main.cpp

imgwindow1.Show();

CQTWindow imgwindow2(binaryimage.width, binaryimage.height); imgwindow2.DrawImage(&binaryimage); imgwindow2.Show();

// main loop: cyclic loading all *.jpg in the directory and processing

while (!qtApplicationHandler.ProcessEventsAndGetExit()) { buf = QFileInfo(path, *it).baseName(); buf += ".jpg"; cout << buf.ascii() << endl; } } return 0;

6 Workpiece Gauging

if (!ImageAccessCV::LoadFromFile(&colorimage, buf.ascii())) { printf("error: could not open input image file\n"); return 1; }

// Inversion: OpenCV calculates Moments for _white_ objects! ImageProcessor::ConvertImage(&colorimage, &grayimage); ImageProcessor::Invert(&grayimage, &grayimage); // (!) ImageProcessor::ThresholdBinarize(&grayimage, &binaryimage, 128);

// Moments... Vec2d center; PointPair2d orientation; MomentCalculations(&binaryimage, center, orientation, theta);

// Visualization / Output: // Center PrimitivesDrawerCV::DrawCircle(&colorimage, center, 3, 0, 255, 0, 1);

// Two Lines to show coordinate system Vec2d v1, v2; v1.x = cos(theta) * 100 + center.x; v1.y = sin(theta) * 100 + center.y; WalkTheLine(&colorimage, center, v1, 255, 0, 0);

v1.x = cos(theta + PI/2) * 100 + center.x; v1.y = sin(theta + PI/2) * 100 + center.y; WalkTheLine(&colorimage, center, v1, 255, 255, 0);

ImageProcessor::Rotate(&binaryimage, &binaryimage, center.x, center.y, theta, true);

// we gauge the cross section near the minor axis // (going parallel to the minor axis): v1.x = center.x+5; v1.y = center.y 200; v2.x = center.x+5; v2.y = center.y + 200; int i = WalkTheLine(&binaryimage, v1, v2, 255, 255, 255);

cout << "Gauging after alignment [pixel]: " << i << endl << endl;

11 Depth Image Acquisition with a Stereo Camera System

Author: Pedram Azad

11.1 Introduction
In Chapter 10, a 3D laser scanner was introduced, which is based on the lightsection method. Since only the prole of an individual cross-section of the object can be calculated on the basis of the projection of a laser line, one degree of freedom between scan unit and object is necessary. In the presented laser scanner, this degree of freedom is realized by a mechanical rotation device, with the aid of which the scan unit can be rotated. All captured proles form together a composite scan. In this chapter, a procedure is now presented that is able to compute a scan with a single image recording. A calibrated stereo camera system observes the scene to be captured, while a projector additionally structures the scene by the projection of an random noise pattern. The calculation is again based on triangulation, using the concepts for camera calibration, epipolar geometry and correlation, as described in Chapter 2. In comparison to the laser scanner, the same robustness cannot be achieved here, since the correlation-based correspondence computation is more error-prone compared to the localization of the laser line.

11.2 Procedure
The hardware setup of the system is shown in Fig. 11.1. As can be seen, a stereo camera system observes the scene, which is structured by the (uncalibrated) projector; the camera images of the two cameras are shown in Fig. 11.2.

252

11 Depth Image Acquisition with a Stereo Camera System

Fig. 11.1. Left: System structure. Right: Calculated depth map for the image pair from Fig. 11.2. The points of the grid were enlarged by = 4 4 to achieve a closed depth map.

Fig. 11.2. Example of an image pair as input to the stereo camera system. Left: Left camera image. Right: Right camera image.

First the stereo camera system must be calibrated. This is done using the application IVT/examples/CalibrationApp (see Chapter 10). The only difference to the calibration of an individual camera is that the checkerboard pattern must be visible in both camera images at the same time. The task of the algorithm is now to compute correspondences in the right camera image for image points in the left camera image. This takes place by utilizing the epipolar geometry described in Section 2.10.2 and the Zero Mean Normalized Cross Correlation (ZNCC), as presented in Section 2.11.3. For each pixel in the left camera image, a (2k + 1) (2k + 1) patch is cut out. This is normalized afterwards with respect to additive and multiplicative brightness dierences (see Section 2.11.3). Correspondences to this image patch are searched for along the epipolar line by calculating the ZNCC for each point of the line. The pixel with the maximum correlation value then identies the

11.2 Procedure

253

correspondence. For the results in the Fig. 11.1 and 11.3, k = 10, i.e. a 21 21 window, was used. In order to make the approach more ecient and more robust, correspondences are searched for only in a given interval {dmin , . . . , dmax } of so-called disparities. The term disparity denotes the Euclidean distance from one point on the epipolar line to a given point in the other camera image. A small disparity represents a large distance to the camera, a large value a small distance. Furthermore, it should be noted that correspondences in the right camera image must lie in left direction, relative to the query point, on the epipolar line. Since, as a result of occlusions, a (correct) correspondence cannot be determined for every pixel, the candidates calculated by the correlation method must be validated on the basis of a threshold t. If the value computed by the ZNCC is greater than this threshold, then the correspondence is accepted. In the presented system, t = 0.4 was selected. An important measure for increasing the robustness is the recognition of homogeneous image areas. Within such areas, correspondences cannot be determined reliably, since good correlation results are calculated for a multiplicity of disparities; the best correlation result is determined by chance. The recognition step which is necessary for handling such cases, can be easily incorporated into the normalization procedure of the image patch (see Section 2.11.3) around the query point. After having subtracted the mean value, the sum of the squared intensities I 2 (u, v) is a reliable measure for the homogeneity of the image patch: a large value identies a heterogeneous area, a small value a homogeneous area. In the presented system, image patches with values smaller than 100 (2k + 1)2 are rejected. After completion of the disparity calculation, the data is ltered by checking the existence of at least ve neighbors with a similar disparity. The disparity of each correspondence calculated in this way is nally entered into a so-called disparity map, in which bright pixels represent a small distance from the camera, and dark pixels a large distance (see Fig. 11.1). Additionally, for each correspondence (u1 , v1 ), (u2 , v2 ) the corresponding 3D point in the world coordinate system is calculated using Algorithm 37 from Section 2.10.1. In order to obtain a higher accuracy, the computed integral disparities are rened with sub-pixel precision using the procedure described in Section 6.2.2. The result is nally a point cloud (see Fig. 11.3). If the point cloud is to be triangulated in order to obtain a 3D mesh, it is usually advantageous to choose the points in the left camera image not at the full resolution, but in a grid with a step size of . In this system, = 5 was chosen. In this way, the noise caused by the limited sub-pixel accuracy does not have a detrimental eect, and can be compensated by triangulation and smoothing. Furthermore, a step size of > 1 leads to a speedup by a factor of 2 .

254

11 Depth Image Acquisition with a Stereo Camera System

The application functions as follows: Firstly, individual points in the left camera image can be set by a simple click with the left mouse button. The correlation result is visualized and displayed in the console. In this way, the minimum and maximum disparity can be measured and adjusted with the slide controls. Now the area for which the depth information is to be calculated, can be selected by dragging a window with the mouse in the left camera image. After nishing the calculations, the application stores the disparity map in the le depth_map.bmp, and the point cloud in xyz representation in the le scan.txt. The point cloud can be triangulated and visualized using the application VisualizeApp (see Section 10.5.2).

Fig. 11.3. Result of a scan for the image pair from Fig. 11.2. Left: Point cloud. Right: Rendered mesh.

To conclude and as outlook it is to be mentioned that an optimized algorithm for the correspondence computation can be used on rectied input images (see Section 2.10.3). This algorithm utilizes a recurrence in conjunction with running sum tables and thereby achieves a run-time that is independent of the window size. Optimized implementations for the generation of disparity maps using this algorithm achieve processing rates of 30 Hz and higher for input images of size 320240. For a comprehensive overview, see [Faugeras 93].

11.3 References and Source Code


[Faugeras 93] O. Faugeras et al., Real-time correlation-based stereo: algorithm, implementation and applications., INRIA Technical Report no. 2013, 1993.

Jan 16, 08 2:08


int SingleZNCC(const CByteImage *pInputImage1, CByteImage *pInputImage2, int x, int y, int nWindowSize, int d1, int d2, float *values, Vec2d &result, bool bDrawLine = false) { const int width = pInputImage1>width; const int height = pInputImage1>height; if (x < nWindowSize / 2 || x >= width nWindowSize / 2 || y < nWindowSize / 2 || y >= height nWindowSize / 2) return 1; const unsigned char *input_left = pInputImage1>pixels; unsigned char *input_right = pInputImage2>pixels; const int nVectorLength = nWindowSize * nWindowSize; float *vector1 = new float[nVectorLength]; float *vector2 = new float[nVectorLength]; const int offset = (y nWindowSize / 2) * width + (x nWindowSize / 2); const int diff = width nWindowSize; Vec2d camera_point = { x, y }; int i, j, offset2, offset3;

stereoscanner.cpp
Page 1/8 Jan 16, 08 2:08 Page 2/8

stereoscanner.cpp

// // // // // // // //

***************************************************************************** Filename: stereoscanner.cpp Copyright: Pedram Azad, Chair Prof. Dillmann (IAIM), Institute for Computer Science and Engineering (CSE), University of Karlsruhe. All rights reserved. Author: Pedram Azad Date: 2007/02/24 *****************************************************************************

#include #include #include #include #include #include #include #include #include

"Image/ByteImage.h" "Image/ImageProcessor.h" "Image/PrimitivesDrawer.h" "Math/FloatMatrix.h" "Calibration/StereoCalibration.h" "VideoCapture/BitmapCapture.h" "gui/QTApplicationHandler.h" "gui/QTWindow.h" "Interfaces/WindowEventInterface.h"

#include <qslider.h> #include <qlabel.h> #include <qlcdnumber.h>

#include <math.h>

static CStereoCalibration *pStereoCalibration;

class CMessageReceiver : public CWindowEventInterface { public: CMessageReceiver() { ok_point = false; ok_rect = false; }

// Calculate the mean value float mean = 0; for (i = 0, offset2 = offset, offset3 = 0; i < nWindowSize; i++, offset2 += diff) for (j = 0; j < nWindowSize; j++, offset2++, offset3++) { vector1[offset3] = input_left[offset2]; mean += vector1[offset3]; } mean /= nVectorLength; // Subtract the mean value and apply // multiplicative normalization float factor = 0; for (i = 0; i < nVectorLength; i++) { vector1[i] = mean; factor += vector1[i] * vector1[i]; } if (factor < nWindowSize * nWindowSize * 100) return 1; factor = 1 / sqrtf(factor); for (i = 0; i < nVectorLength; i++) vector1[i] *= factor; float best_value = 9999999; int d, best_d = 0; const int max_d = d2 < x ? d2 : x; double m, c; pStereoCalibration>CalculateEpipolarLineInRightImage(camera_point, m, c); // Determine the correspondence for (d = d1; d <= max_d; d++) {

void RectSelected(int x0, int y0, int x1, int y1) { ok_rect = true; this>x0 = x0; this>y0 = y0; this>x1 = x1; this>y1 = y1; }

11.3 References and Source Code

void PointClicked(int x, int y) { ok_point = true; this>x = x; this>y = y; }

int x, y; bool ok_point;

};

int x0, y0, x1, y1; bool ok_rect;

255

256

Jan 16, 08 2:08

stereoscanner.cpp
Page 3/8 Jan 16, 08 2:08 Page 4/8

stereoscanner.cpp

const int yy = int(m * (x d) + c + 0.5) nWindowSize / 2;

if (yy < 0 || yy >= height) continue;

const int offset_right = yy * width + (x d nWindowSize / 2); const int offset_diff = offset_right offset;

bool ZNCC(CByteImage *pLeftImage, CByteImage *pRightImage, CFloatMatrix *pDisparityMap, int nWindowSize, int d1, int d2, float threshold, int step, int x0, int y0, int x1, int y1) { unsigned char *input_left = pLeftImage>pixels; unsigned char *input_right = pRightImage>pixels; float *output = pDisparityMap>data; int i; const const const const for (i = 0; i < nPixels; i++) output[i] = 0; if if if if float *values = new float[width]; float *vector1 = new float[nVectorLength]; float *vector2 = new float[nVectorLength]; for (i = y0; i < y1; i += step) { for (int j = x0; j < x1; j += step) { Vec2d result; const int best_d = SingleZNCC(pLeftImage, pRightImage, j, i, nWindowSize, d1, d2, values, result); if (best_d != 1 && values[best_d] > threshold) { const double y0 = values[best_d 1]; const double y1 = values[best_d]; const double y2 = values[best_d + 1]; const double xmin = (y0 y2) / (2 * (y0 2 * y1 + y2)); output[(i + nWindowSize / 2) * width + j + nWindowSize / 2] = best_d + xmin; } } } printf("i = %i\n", i); delete [] vector1; delete [] vector2; delete [] values; } return true; (x0 (y0 (x1 (y1 < < > > nWindowSize / 2) x0 = nWindowSize / 2; nWindowSize / 2) y0 = nWindowSize / 2; width nWindowSize / 2) x1 = width nWindowSize / 2; height nWindowSize / 2) y1 = height nWindowSize / 2; int int int int width = pLeftImage>width; height = pLeftImage>height; nPixels = width * height; nVectorLength = nWindowSize * nWindowSize;

// Calculate the mean value float mean = 0; for (i = 0, offset2 = offset_right, offset3 = 0; i < nWindowSize; i++, offset2 += diff) for (j = 0; j < nWindowSize; j++, offset2++, offset3++) { vector2[offset3] = input_right[offset2]; mean += vector2[offset3];

} mean /= nVectorLength;

// Subtract the mean value and apply // multiplicative normalization float factor = 0; for (i = 0; i < nVectorLength; i++) { vector2[i] = mean; factor += vector2[i] * vector2[i]; } factor = 1 / sqrtf(factor); for (i = 0; i < nVectorLength; i++) vector2[i] *= factor;

float value = 0; for (i = 0; i < nVectorLength; i++) value += vector1[i] * vector2[i];

// Save correlation result for subpixel calculation values[d] = value;

11 Depth Image Acquisition with a Stereo Camera System

// Determine the maximum correlation value if (value > best_value) { best_value = value; best_d = d; }

// Visualization if (bDrawLine) { for (d = d1; d <= max_d; d++) input_right[int(m * (x d) + c + 0.5) * width + (x d)] = 255; }

result.x = x best_d; result.y = m * (x best_d) + c;

delete [] vector1; delete [] vector2;

return best_d;

Jan 16, 08 2:08


pStereoCalibration = &stereo_calibration; CByteImage *ppImages[] = { new CByteImage(width, height, type), new CByteImage(width, height, type) }; CFloatMatrix disparity_map(width, height); CByteImage depth_image(width, height, CByteImage::eGrayScale); CByteImage image_left(&depth_image), image_right(&depth_image);

stereoscanner.cpp
Page 5/8 Jan 16, 08 2:08 Page 6/8

stereoscanner.cpp

void Filter(CFloatMatrix *pDisparityMap, int step) { CFloatMatrix result(pDisparityMap); const int width = pDisparityMap>columns; const int height = pDisparityMap>rows; const int stepw = step * width; const float *data = pDisparityMap>data; const float max = 5; // Initialize Qt CQTApplicationHandler qtApplicationHandler(argc, args); qtApplicationHandler.Reset(); // Create window CMessageReceiver receiver; CQTWindow window(2 * width, height + 180, &receiver); // LCD numbers QLCDNumber *pLCD_WindowSize = new QLCDNumber(3, &window); pLCD_WindowSize>setFixedWidth(80); pLCD_WindowSize>setFixedHeight(40); pLCD_WindowSize>move(20, height + 20);

ImageProcessor::Zero(&result);

for (int y = 0; y < height; y++) { for (int x = 0; x < width; x++) { const int offset = y * width + x;

if (data[offset] != 0) { // Determine the number of similar neighbors int n = 0;

< max; < max; < max; < max;

n n n n n n n n

+= += += += += += += +=

fabs(data[offset] fabs(data[offset] fabs(data[offset] fabs(data[offset] fabs(data[offset] fabs(data[offset] fabs(data[offset] fabs(data[offset]

data[offset data[offset data[offset data[offset data[offset data[offset data[offset data[offset

+ + + +

step]) < max; step]) < max; stepw step]) stepw]) < max; stepw + step]) stepw step]) stepw]) < max; stepw + step])

QLCDNumber *pLCD_MinDisparity = new QLCDNumber(3, &window); pLCD_MinDisparity>setFixedWidth(80); pLCD_MinDisparity>setFixedHeight(40); pLCD_MinDisparity>move(20, height + 70); QLCDNumber *pLCD_MaxDisparity = new QLCDNumber(3, &window); pLCD_MaxDisparity>setFixedWidth(80); pLCD_MaxDisparity>setFixedHeight(40); pLCD_MaxDisparity>move(20, height + 120);

if (n >= 5) result.data[offset] = data[offset];

ImageProcessor::CopyMatrix(&result, pDisparityMap);

// Sliders QSlider *pSliderWindowSize = new QSlider(1, 49, 2, 21, Qt::Horizontal, &window); pSliderWindowSize>setFixedWidth(400); pSliderWindowSize>setFixedHeight(20); pSliderWindowSize>move(120, height + 30); QSlider *pSliderMinDisparity = new QSlider(0, 500, 1, 150, Qt::Horizontal, &window); pSliderMinDisparity>setFixedWidth(400); pSliderMinDisparity>setFixedHeight(20); pSliderMinDisparity>move(120, height + 80); QSlider *pSliderMaxDisparity = new QSlider(0, 500, 1, 220, Qt::Horizontal, &window); pSliderMaxDisparity>setFixedWidth(400); pSliderMaxDisparity>setFixedHeight(20); pSliderMaxDisparity>move(120, height + 130); // Labels QLabel *pLabelWindowSize = new QLabel(&window); pLabelWindowSize>setText("Window Size"); pLabelWindowSize>setFixedWidth(200); pLabelWindowSize>setFixedHeight(20); pLabelWindowSize>move(540, height + 30); QLabel *pLabelMinDisparity = new QLabel(&window); pLabelMinDisparity>setText("Minimum Disparity"); pLabelMinDisparity>setFixedWidth(200); pLabelMinDisparity>setFixedHeight(20);

int main(int argc, char **args) { CBitmapCapture capture("test_left.bmp", "test_right.bmp"); if (!capture.OpenCamera()) { printf("Error: Could not open camera.\n"); return 1; }

11.3 References and Source Code

const int width = capture.GetWidth(); const int height = capture.GetHeight(); const CByteImage::ImageType type = capture.GetType();

CStereoCalibration stereo_calibration; if (!stereo_calibration.LoadCameraParameters("cameras.txt")) { printf("Error: Could not load file with camera \ parameters.\n"); return 1; }

257

258

Jan 16, 08 2:08


point_right, world_point, false); fprintf(f, "%f %f %f\n", world_point.x, world_point.y, world_point.z);

stereoscanner.cpp
Page 7/8 Jan 16, 08 2:08 Page 8/8

stereoscanner.cpp

pLabelMinDisparity>move(540, height + 80);

QLabel *pLabelMaxDisparity = new QLabel(&window); pLabelMaxDisparity>setText("Maximum Disparity"); pLabelMaxDisparity>setFixedWidth(200); pLabelMaxDisparity>setFixedHeight(20); pLabelMaxDisparity>move(540, height + 130); } fclose(f); // Calculate depth map for (int i = 0; i < width * height; i++) if (disparity_map.data[i] == 0) disparity_map.data[i] = d1; ImageProcessor::ConvertImage(&disparity_map, &depth_image); depth_image.SaveToFile("depth_map.bmp"); } break; } }

// Show window window.Show();

while (!qtApplicationHandler.ProcessEventsAndGetExit()) { const int nWindowSize = pSliderWindowSize>value(); const int d1 = pSliderMinDisparity>value(); const int d2 = pSliderMaxDisparity>value(); pLCD_WindowSize>display(nWindowSize); pLCD_MinDisparity>display(d1); pLCD_MaxDisparity>display(d2);

if (!capture.CaptureImage(ppImages)) break;

if (receiver.ok_point) { Vec2d result; float *values = new float[width]; const int best_d = SingleZNCC(&image_left, &image_right, receiver.x, receiver.y, nWindowSize, 0, 400, values, result, true); if (best_d != 1) { printf("best_d = %i: %f %f %f\n", best_d, values[best_d 2], values[best_d], values[best_d + 2]); delete [] values; MyRegion region; region.min_x = receiver.x nWindowSize / region.max_x = receiver.x + nWindowSize / region.min_y = receiver.y nWindowSize / region.max_y = receiver.y + nWindowSize / PrimitivesDrawer::DrawRegion(&image_left, 2; 2; 2; 2; region);

if (type == CByteImage::eGrayScale) { ImageProcessor::CopyImage(ppImages[0], &image_left); ImageProcessor::CopyImage(ppImages[1], &image_right); } else { ImageProcessor::ConvertImage(ppImages[0], &image_left); ImageProcessor::ConvertImage(ppImages[1], &image_right); }

if (receiver.ok_rect) { ZNCC(&image_left, &image_right, &disparity_map, nWindowSize, d1, d2, 0.4f, 4, receiver.x0, receiver.y0, receiver.x1, receiver.y1);

Filter(&disparity_map, 4);

11 Depth Image Acquisition with a Stereo Camera System

region.min_x = int(result.x + 0.5) nWindowSize / 2; region.max_x = int(result.x + 0.5) + nWindowSize / 2; region.min_y = int(result.y + 0.5) nWindowSize / 2; region.max_y = int(result.y + 0.5) + nWindowSize / 2; PrimitivesDrawer::DrawRegion(&image_right, region);

// Calculate point cloud FILE *f = fopen("scan.txt", "w"); const float *disparity = disparity_map.data; for (int y = 0, offset = 0; y < height; y++) { for (int x = 0; x < width; x++, offset++) { if (disparity[offset] != 0) { Vec3d world_point; Vec2d point_left = { x, y }; } }

window.DrawImage(&image_left); window.DrawImage(&image_right, width, 0); delete ppImages[0]; delete ppImages[1]; return 0;

double m, c; stereo_calibration.CalculateEpipolarLineInRightImage( point_left, m, c);

Vec2d point_right = { x disparity[offset], m * (x disparity[offset]) + c }; stereo_calibration.Calculate3DPoint(point_left,

Excerpt chapter truncated . . .

Vous aimerez peut-être aussi