Vous êtes sur la page 1sur 595
INTELLIGENT SIGNAL PROCESSING Edited by Simon Haykin McMaster University Hamilton, Ontario, Canada Bart Kosko University of

INTELLIGENT SIGNAL PROCESSING

Edited by

Simon Haykin

McMaster University Hamilton, Ontario, Canada

Bart Kosko

University of Southern California Los Angeles, CA, USA

A Selected Reprint Volume

IEEE

PRESS

The Institute 6f Electrical and Electronics Engineers, Inc., New York

This book and other books may be purchased at a discount from the publisher when ordered in bulk quantities. Contact:

IEEE Press Marketing Attn: Special Sales 445 Hoes Lane P.O. Box 1331 Piscataway, NJ 08855-1331 Fax: ÷1 732 981 9334

For more information about IEEE Press products, visit the IEEE Online Catalog and Store: http://www.ieee.org/store.

© 2001 by the Institute of Electrical and Electronics Engineers, Inc. 3 Park Avenue, 17th Floor, New York, NY 10016-5997.

All rights reserved. No part of this book may be reproduced in any form, nor may it be stored in a retrieval system or transmitted in any form, without written permission from the publisher.

Printed in the United States of America.

10

9

8

7

6

5

4

3

2

1

ISBN 0-7803-6010-9 IEEE Order No. PC5860

Library of Congress Cataloging-in-Publication Data

Intelligent signal processing / edited by Simon Haykin, Bart Kosko. cm.

p. "A selected reprint volume." Includes index. ISBN 0-7803-6010-9

1. Signal processingmDigital techniques. 2. Intelligent control systems. 3. Adaptive

signal processing.

I. Haykin, Simon S., 1931-

II. Kosko, Bart.

TK5102.9.I5455

2000

621.382'2mdc21

This book is dedicated to Bernard Widrow for laying the foundations of adaptive filters

Preface

This expanded reprint volume is the first book devoted to the new field of intelligent signal processing (ISP). It grew out of the November 1998 ISP special issue of the IEEE Proceedings that the two of us coedited. This book contains new ISP material and a fuller treatment of the articles that appeared in the ISP special issue.

WHAT Is ISP?

ISP uses learning and other "smart" techniques to extract as much information as possible from incoming signal and noise data. It makes few if any assumptions about the statistical structure of signals and their environment. ISP seeks to let the data set tell its own story rather than to impose a story on the data in the form of a simple mathematical model. Classical signal processing has largely worked with math- ematical models that are linear, local, stationary, and Gaussian. These assumptions stem from the precomputer age. They have always favored closed-form tractability over real-world accuracy, and they are no less extreme because they are so familiar. But real systems are nonlinear except for a vanishingly small set'of linear systems. Almost all bell-curve probabil- ity densities have infinite variance and infinite higher order moments. The set of bell-curve densities itself is a vanishin- gly small set in the space of all probability densities. Real- world systems are often highly nonlinear and can depend on many partially correlated variables. The systems can have an erratic or impulsive statistical structure that varies in time in equally erratic ways. Small changes in the signal or noise structure can lead to qualitative global changes in how the system filters noise or maintains stability.

ISP has emerged recently in signal processing in much the same way that intelligent control has emerged from standard linear control theory. Researchers have guessed less at equations to model a complex system's throughput and have instead let so-called "intelligent" or "model- free" techniques guess more for them. Adaptive neural networks have been'the most popular black box tools in ISP. Multilayer perceptrons and radial- basis function networks extend adaptive linear combiners to the nonlinear domain but require vastly more computa- tion. Other ISP techniques include fuzzy rule-based sys- tems, genetic algorithms, and the symbolic expert systems of artificial intelligence. Both neural and fuzzy systems can learn with supervised and unsupervised techniques. Both are (like polynomials) universal function approximators:

They can uniformly approximate any continuous function on a compact domain, but this may not be practical in many real-world cases. The property of universal approximation justifies the term "model free" to describe neural and fuzzy systems even though equations describe their own throughput structure. They are one-size-fits-all approxima- tors that can model any process if they have access to enough training data. But ISP tools face new problems when we apply them to more real-world problems that are nonlinear, nonlocal, nonstationary, non-Gaussian, and of high dimension. Prac- tical neural systems may require prohibitive computation to tune the values of their synaptic weights for large sets of high-dimensional data. New signal data may require total retraining or may force the neural network's vast and unfathomable set of synapses to forget some of the signal structure it has learned. Blind fuzzy approximators need a number of if-then rules that grows exponentially with the dimension of the training data. This volume explores how the ISP tools can address these problems.

xvii

Preface

ORGANIZATION OF THE VOLUME

The 15 chapters in this book give a representative sample of current research in ISP and each has helped extend the ISP frontier. Each chapter passed through a full peer- review filter:

  • 1. Steve Mann describes a novel technique that lets one include human intelligence in the operation of a wear- able computer.

  • 2. Sanya Mitaim and Bart Kosko present the noise pro- cessing technique of stochastic resonance in a signal processing framework and then show how neural or fuzzy or other model-free systems can adaptively add many types of noise to nonlinear dynamical systems to improve their signal-to-noise ratios.

  • 3. Malik Magdon-Ismail, Alexander Nicholson, and Yaser S. Abu-Mostafa explore how additive noise af- fects information processing in problems of financial engineering.

  • 4. Partha Niyogi, Fredrico Girosi, and Tomaso Poggio show how prior knowledge and virtual sampling can expand the size of a data set that trains a generalized supervised learning system.

  • 5. Kenneth Rose reviews how the search technique of deterministic annealing can optimize the design of un- supervised and supervised learning systems.

  • 6. Jose C. Principe, Ludong Wang, and Mark A. Motter use the neural self-organizing map as a tool for the local modeling of a nonlinear dynamical system.

  • 7. Lee A. Feldkamp and Gintaras V. Puskorius describe how time-lagged recurrent neural networks can per- form difficult tasks of nonlinear signal processing.

  • 8. Davide Mattera, Francesco Palmieri, and Simon Hay- kin describe a semiparametric form of support vector machine for nonlinear model estimation that uses prior knowledge that comes from a rough parametric model of the system under study.

9. Yann LeCun, L6on Bottou, Yoshua Bengio, and Pat- rick Haffner review ways that gradient-descent learn- ing can train a multilayer perceptron for handwritten character recognition.

  • 10. Shigeru Katagiri, Biing-Hwang Juang, and Chin-Hui Lee show how to use the new technique of generalized probabilistic gradients to solve problems in pattern rec- ognition.

  • 11. Lee A. Feldkamp, Timothy M. Feldkamp, and Danil V. Prokhorov present an adaptive classification scheme that combines both supervised and unsupervised learning.

  • 12. J. Scott Goldstein, J.R. Guescin and I.S. Reed describe an algebraic procedure based on reduced rank model- ing as a basis for intelligent signal processing

  • 13. Simon Haykin and David J. Thomson discuss an adap- tive procedure for the difficult task of detecting a non- stationary target signal in a nonstationary background with unknown statistics.

  • 14. Robert D. Dony and Simon Haykin describe an image segmentation system based on a mixture of principal components.

  • 15. Aapo Hyv~irinen, Patrik Hoyar and Erkki Oja discuss how sparse coding can denoise images.

These chapters show how adaptive systems can solve a wide range of difficult tasks in signal processing that arise in highly diverse fields. They are a humble but important first step on the road to truly intelligent signal processing.

Simon Haykin

McMaster University Hamilton, Ontario, Canada

Bart Kosko

University of Southern California Los Angeles, CA

xviii

List of Contributors

Chapter 1

Steve Mann University of Toronto Department of Electrical and Computer Engineering 10 King's College Road, S.F. 2001 Toronto, Ontario, M5S 3G4 CANADA

Chapter 2

Bart Kosko Signal and Image Processing Institute Department of Electrical Engineering-Systems

University of Southern California Los Angeles, California 90089-2564

Sanya Mitaim Signal and Image Processing Institute Department of Electrical Engineering-Systems University of Southern California Los Angeles, California 90089-2564

Chapter 3

Malik Magdon-Ismail Department of Electrical Engineering California Institute of Technology 136-93 Pasadena, CA 91125

Alexander Nicholson Department of Electrical Engineering California Institute of Technology 136-93 Pasadena, CA 91125

Yaser S. Abu-Mostafa Department of Electrical Engineering California Institute of Technology 136-93 Pasadena, CA 91125

Chapter 4

Partha Niyogi Massachusetts Institute of Technology Center for Biological and Computational Learning Cambridge, MA 02129

Fredrico Girosi Massachusetts Institute of Technology Center for Biological and Computational Learning Cambridge, MA 02129

Tomaso Poggio Massachusetts Institute of Technology Center for Biological and Computational Learning Cambridge, MA 02129

Chapter 5

Kenneth Rose Department of Electrical and Computer Engineering University of California Santa Barbara, CA_ 93106

Chapter 6

Jose C. Principe Computational NeuroEngineering Laboratory University of Florida Gainsville, FL 32611

Uudong Wang Computational NeuroEngineering Laboratory University of Florida Gainsville, FL 32611

Mark A. Motter Computational NeuroEngineering Laboratory University of Florida Gainsville, FL 32611

xix

INTELLIGENT SIGNAL PROCESSING Edited by Simon Haykin McMaster University Hamilton, Ontario, Canada Bart Kosko University of

INTELLIGENT SIGNAL PROCESSING

Edited by

Simon Haykin

McMaster University Hamilton, Ontario, Canada

Bart Kosko

University of Southern California Los Angeles, CA, USA

A Selected Reprint Volume

IEEE

PRESS

The Institute 6f Electrical and Electronics Engineers, Inc., New York

This book and other books may be purchased at a discount from the publisher when ordered in bulk quantities. Contact:

IEEE Press Marketing Attn: Special Sales 445 Hoes Lane P.O. Box 1331 Piscataway, NJ 08855-1331 Fax: ÷1 732 981 9334

For more information about IEEE Press products, visit the IEEE Online Catalog and Store: http://www.ieee.org/store.

© 2001 by the Institute of Electrical and Electronics Engineers, Inc. 3 Park Avenue, 17th Floor, New York, NY 10016-5997.

All rights reserved. No part of this book may be reproduced in any form, nor may it be stored in a retrieval system or transmitted in any form, without written permission from the publisher.

Printed in the United States of America.

10

9

8

7

6

5

4

3

2

1

ISBN 0-7803-6010-9 IEEE Order No. PC5860

Library of Congress Cataloging-in-Publication Data

Intelligent signal processing / edited by Simon Haykin, Bart Kosko. cm.

p. "A selected reprint volume." Includes index. ISBN 0-7803-6010-9

1. Signal processingmDigital techniques. 2. Intelligent control systems. 3. Adaptive

signal processing.

I. Haykin, Simon S., 1931-

II. Kosko, Bart.

TK5102.9.I5455

2000

621.382'2mdc21

This book is dedicated to Bernard Widrow for laying the foundations of adaptive filters

Preface

This expanded reprint volume is the first book devoted to the new field of intelligent signal processing (ISP). It grew out of the November 1998 ISP special issue of the IEEE Proceedings that the two of us coedited. This book contains new ISP material and a fuller treatment of the articles that appeared in the ISP special issue.

WHAT Is ISP?

ISP uses learning and other "smart" techniques to extract as much information as possible from incoming signal and noise data. It makes few if any assumptions about the statistical structure of signals and their environment. ISP seeks to let the data set tell its own story rather than to impose a story on the data in the form of a simple mathematical model. Classical signal processing has largely worked with math- ematical models that are linear, local, stationary, and Gaussian. These assumptions stem from the precomputer age. They have always favored closed-form tractability over real-world accuracy, and they are no less extreme because they are so familiar. But real systems are nonlinear except for a vanishingly small set'of linear systems. Almost all bell-curve probabil- ity densities have infinite variance and infinite higher order moments. The set of bell-curve densities itself is a vanishin- gly small set in the space of all probability densities. Real- world systems are often highly nonlinear and can depend on many partially correlated variables. The systems can have an erratic or impulsive statistical structure that varies in time in equally erratic ways. Small changes in the signal or noise structure can lead to qualitative global changes in how the system filters noise or maintains stability.

ISP has emerged recently in signal processing in much the same way that intelligent control has emerged from standard linear control theory. Researchers have guessed less at equations to model a complex system's throughput and have instead let so-called "intelligent" or "model- free" techniques guess more for them. Adaptive neural networks have been'the most popular black box tools in ISP. Multilayer perceptrons and radial- basis function networks extend adaptive linear combiners to the nonlinear domain but require vastly more computa- tion. Other ISP techniques include fuzzy rule-based sys- tems, genetic algorithms, and the symbolic expert systems of artificial intelligence. Both neural and fuzzy systems can learn with supervised and unsupervised techniques. Both are (like polynomials) universal function approximators:

They can uniformly approximate any continuous function on a compact domain, but this may not be practical in many real-world cases. The property of universal approximation justifies the term "model free" to describe neural and fuzzy systems even though equations describe their own throughput structure. They are one-size-fits-all approxima- tors that can model any process if they have access to enough training data. But ISP tools face new problems when we apply them to more real-world problems that are nonlinear, nonlocal, nonstationary, non-Gaussian, and of high dimension. Prac- tical neural systems may require prohibitive computation to tune the values of their synaptic weights for large sets of high-dimensional data. New signal data may require total retraining or may force the neural network's vast and unfathomable set of synapses to forget some of the signal structure it has learned. Blind fuzzy approximators need a number of if-then rules that grows exponentially with the dimension of the training data. This volume explores how the ISP tools can address these problems.

xvii

Preface

ORGANIZATION OF THE VOLUME

The 15 chapters in this book give a representative sample of current research in ISP and each has helped extend the ISP frontier. Each chapter passed through a full peer- review filter:

  • 1. Steve Mann describes a novel technique that lets one include human intelligence in the operation of a wear- able computer.

  • 2. Sanya Mitaim and Bart Kosko present the noise pro- cessing technique of stochastic resonance in a signal processing framework and then show how neural or fuzzy or other model-free systems can adaptively add many types of noise to nonlinear dynamical systems to improve their signal-to-noise ratios.

  • 3. Malik Magdon-Ismail, Alexander Nicholson, and Yaser S. Abu-Mostafa explore how additive noise af- fects information processing in problems of financial engineering.

  • 4. Partha Niyogi, Fredrico Girosi, and Tomaso Poggio show how prior knowledge and virtual sampling can expand the size of a data set that trains a generalized supervised learning system.

  • 5. Kenneth Rose reviews how the search technique of deterministic annealing can optimize the design of un- supervised and supervised learning systems.

  • 6. Jose C. Principe, Ludong Wang, and Mark A. Motter use the neural self-organizing map as a tool for the local modeling of a nonlinear dynamical system.

  • 7. Lee A. Feldkamp and Gintaras V. Puskorius describe how time-lagged recurrent neural networks can per- form difficult tasks of nonlinear signal processing.

  • 8. Davide Mattera, Francesco Palmieri, and Simon Hay- kin describe a semiparametric form of support vector machine for nonlinear model estimation that uses prior knowledge that comes from a rough parametric model of the system under study.

9. Yann LeCun, L6on Bottou, Yoshua Bengio, and Pat- rick Haffner review ways that gradient-descent learn- ing can train a multilayer perceptron for handwritten character recognition.

  • 10. Shigeru Katagiri, Biing-Hwang Juang, and Chin-Hui Lee show how to use the new technique of generalized probabilistic gradients to solve problems in pattern rec- ognition.

  • 11. Lee A. Feldkamp, Timothy M. Feldkamp, and Danil V. Prokhorov present an adaptive classification scheme that combines both supervised and unsupervised learning.

  • 12. J. Scott Goldstein, J.R. Guescin and I.S. Reed describe an algebraic procedure based on reduced rank model- ing as a basis for intelligent signal processing

  • 13. Simon Haykin and David J. Thomson discuss an adap- tive procedure for the difficult task of detecting a non- stationary target signal in a nonstationary background with unknown statistics.

  • 14. Robert D. Dony and Simon Haykin describe an image segmentation system based on a mixture of principal components.

  • 15. Aapo Hyv~irinen, Patrik Hoyar and Erkki Oja discuss how sparse coding can denoise images.

These chapters show how adaptive systems can solve a wide range of difficult tasks in signal processing that arise in highly diverse fields. They are a humble but important first step on the road to truly intelligent signal processing.

Simon Haykin

McMaster University Hamilton, Ontario, Canada

Bart Kosko

University of Southern California Los Angeles, CA

xviii

List of Contributors

Chapter 1

Steve Mann University of Toronto Department of Electrical and Computer Engineering 10 King's College Road, S.F. 2001 Toronto, Ontario, M5S 3G4 CANADA

Chapter 2

Bart Kosko Signal and Image Processing Institute Department of Electrical Engineering-Systems

University of Southern California Los Angeles, California 90089-2564

Sanya Mitaim Signal and Image Processing Institute Department of Electrical Engineering-Systems University of Southern California Los Angeles, California 90089-2564

Chapter 3

Malik Magdon-Ismail Department of Electrical Engineering California Institute of Technology 136-93 Pasadena, CA 91125

Alexander Nicholson Department of Electrical Engineering California Institute of Technology 136-93 Pasadena, CA 91125

Yaser S. Abu-Mostafa Department of Electrical Engineering California Institute of Technology 136-93 Pasadena, CA 91125

Chapter 4

Partha Niyogi Massachusetts Institute of Technology Center for Biological and Computational Learning Cambridge, MA 02129

Fredrico Girosi Massachusetts Institute of Technology Center for Biological and Computational Learning Cambridge, MA 02129

Tomaso Poggio Massachusetts Institute of Technology Center for Biological and Computational Learning Cambridge, MA 02129

Chapter 5

Kenneth Rose Department of Electrical and Computer Engineering University of California Santa Barbara, CA_ 93106

Chapter 6

Jose C. Principe Computational NeuroEngineering Laboratory University of Florida Gainsville, FL 32611

Uudong Wang Computational NeuroEngineering Laboratory University of Florida Gainsville, FL 32611

Mark A. Motter Computational NeuroEngineering Laboratory University of Florida Gainsville, FL 32611

xix

Contents Preface xvii List of Contributors xix CHAPTER 1 Humanistic Intelligence: "Wear Comp" As a New

Contents

Preface

xvii

List of Contributors

xix

CHAPTER 1

Humanistic Intelligence: "Wear Comp" As a New Framework

and Application for Intelligent Signal Processing

INTRODUCTION 1 Why Humanistic Intelligence?

2

1

Humanistic Intelligence Does Not Necessarily Mean "user-friendly"

2

"WEAR COMP" AS MEANS OF REALIZING HUMANISTIC INTELLIGENCE

Basic Principals of WearComp

3

Operational modes of WearComp

3

The Six Basic Signal Flow Paths of WearComp

PHILOSOPHICAL ISSUES

6

Fundamental Issues of WearComp

6

5

Historical context

6

The

shift from guns to cameras and computers

6

The shift from draconian punishment to micro management

Fundamental issues of WearComp

8

Aspects of WearComp and Personal Empowerment PRACTICAL EMBODIMENT OF WEARCOMP Building Signal-Processing Devices Directly into Fabric

9

10

12

Remaining issue of underwearable signal processing hardware

7

13

Multidimensional Signal Input for Humanistic Intelligence

14

Safety first

15

More than just a health status monitor

16

PERSONAL IMAGING APPLICATION OF HUMANISTIC INTELLIGENCE

Some Simple Illustrative Examples

16

Always ready: From point and click to "look and think" Personal safety device for reducing crime 17

16

The retro-autofocus example: Human in the signal processing loop

Mathematical Framework for Personal Imaging

Quantigraphic imaging and

the Wyckoff Principle

Video orbits

21)

18

18

17

16

°°

vii

Contents

Dynamic range and "dynamic domain" Bi-foveated WearCam 30

Lightspace modeling for HI

31

24

BEYOND VIDEO: SYNTHETIC SYNESTHESIA AND PERSONAL IMAGING

Synthetic Synesthesia: Adding New Sensory Capabilities to the Body

Safety first

35

A true extension of the mind and body

36

34

34

CONCLUSIONS 36 ACKNOWLEDGMENTS 37 REFERENCES 37 CHAPTER 2 Adaptive Stochastic Resonance 40 ABSTRACT 40 STOCHASTIC RESONANCE
CONCLUSIONS
36
ACKNOWLEDGMENTS
37
REFERENCES
37
CHAPTER 2
Adaptive Stochastic Resonance
40
ABSTRACT 40
STOCHASTIC RESONANCE AND ADAPTIVE FUNCTION APPROXIMATION
41
SR DYNAMICAL SYSTEMS
SR PERFORMANCE MEASURES
46
51
ADDITIVE FUZZY SYSTEMS AND FUNCTION APPROXIMATION 56
SR LEARNING AND EQUILIBRIUM
58
The Signal-to-Noise Ratio in Nonlinear Systems
Supervised Gradient Learning and SR Optimality
58
63
SR LEARNING: SIMULATION RESULTS
68
SR Test Case: The Quartic Bistable System
70
Other SR Test Cases
77
Fuzzy Sr Learning: The Quartic Bistable System
CONCLUSIONS 86
81
REFERENCES 89
APPENDIX A. THE STANDARD ADDITIVE MODEL (SAM) THEOREM
104
APPENDIX
B. SAM
GRADIENT LEARNING
106
CHAPTER 3
Learning in the Presence of Noise
108
ABSTRACT 108
INTRODUCTION 108
FINANCIAL TIME SERIES PREDICTION
U0
IMPACT OF NOISE ON LEARNING
111
The Learning Problem
111
Performance of a Learning System
113
Estimating the Model Limitation
116
APPLICATION TO FINANCIAL MARKET FORECASTING
116
CONCLUSION 117
ACKNOWLEDGMENTS 118
APPENDIX 119
REFERENCES 125
CHAPTER 4
Incorporating Prior Information in Machine Learning by Creating Virtual Examples

ABSTRACT 127 LEARNING FROM EXAMPLES

128

Background: Learning as Function Approximation

128

PRIOR INFORMATION AND THE PROBLEM OF SAMPLE COMPLEXITY

129

127

o,.

VIII

Conmn~

VIRTUAL EXAMPLES: A FRAMEWORK FOR PRIOR INFORMATION

The General Framework

134

Techniques for Prior Information and Related Research

Prior Knowledge in the Choice of Variables of Features

Prior Knowledge in the Learning Technique

135

Generating New Examples with Prior Knowledge

Incorporating Prior Knowledge

as Hints

137

136

135

135

133

VIRTUAL EXAMPLES AND REGULARIZATION 138

Regularization Theory and RBF

138

Regularization Theory in Presence of Radial Symmetry

Radial Symmetry and "Virtual" Examples

141

140

VIRTUAL EXAMPLES IN VISION AND SPEECH 143

Virtual Views for Object Recognition Symmetry as Prior Information

146

144

More General Transformations: Linear Object Classes

147

3D

Objects, 2D Projections, and Linear Classes

148

Implications

149

CHAPTER 5

Learning the Transformation

150

Virtual Examples in Speech Recognition

CONCLUSIONS

158

REFERENCES

158

152

Deterministic Annealing for Clustering, Compression, Classification, Regression, and Speech

Recognition

163

ABSTRACT 163 INTRODUCTION 163

DETERMINISTIC ANNEALING FOR UNSUPERVISED LEARNING

Clustering

166

Principled Derioation of Deterministic Annealing

167

Statistical Physics Analogy 169 Mass-Constrained Clustering 172 Preferred Implementation of the DA Clustering Algorithm Illustrative Examples 178

Extensions and Applications

18t)

Vector Quantization for Noisy Channels Entropy-Constrained Vector Quantizer Design

180

Structurally Constrained Vector Quantizer Design Graph-Theoretic and Other Optimization Problems

182

183

185

177

DETERMINISTIC ANNEALING FOR SUPERVISED LEARNING

Problem Formulation Basic Derivation

188

190

Generality and Wide Applicability of the DA Solution

192

166

188

Regression,

Classification, and Clustering

192

Structures

193

Experimental Results

197

VQ Classifier Design

197

RBF Classifier Design

201

MLP Classifier Design

201

Piecewise Regression

202

Mixture of Experts 206

ix

Contents

SPEECH RECOGNITION Problem Formulation

212

207

Design by Deterministic Annealing

213

Experimental Results

215

THE RATE-DISTORTION CONNECTION

RECENT DA EXTENSIONS SUMMARY 220 REFERENCES 221

219

217

CHAPTER 6 Local Dynamic Modeling with Self-Organizing Maps and Applications to Nonlinear System Identification and Control
CHAPTER 6
Local Dynamic Modeling with Self-Organizing Maps and Applications
to Nonlinear System Identification and Control
230
ABSTRACT 230
INTRODUCTION 230
DYNAMIC MODELING
232
GLOBAL AND LOCAL MODELS
235
Global Dynamic Models
Local Dynamic Models
236
237
State Dependent Prediction of Nonlinear AR Processes
239
KOHONEN'S SELF-ORGANIZING MAP (SOM)
240
SOM Networks and Kohonen Learning
240
Codebook in Reconstruction Space
243
SOM-BASED LOCAL LINEAR MODELS
245
Dynamic Learning in the SOM
248
EXPERIMENTAL RESULTS 251
DESIGN OF SWITCHING CONTROLLERS BASED ON SOM 255
Problem Specification
The Control Strategy
255
255
Design of the PMMSC 258
EXPERIMENTAL CONTROL OF THE WIND TUNNEL 260
Comparison of PPMSC Control to Existing Automatic Controller and Expert Operator
260
CONCLUSIONS
264
REFERENCES
267
APPENDIX
270
CHAPTER 7
A Signal Processing Framework Based on Dynamic Neural Networks
with Application to Problems in Adaptation, Filtering and Classification
272

ABSTRACT 272 INTRODUCTION 272

NETWORK ARCHITECTURE AND EXECUTION

GRADIENT CALCULATION EKF MULTI-STREAM TRAINING The Kalman Recursion 275

275

275

273

Multi-Stream Training 277

Some Insight into the Multi-Stream Technique 278 Advantages and Extensions of Multi-Stream Training

SYNTHETIC EXAMPLES

280

279

Multiple Series Prediction and Dropout Rejection

Training

280

280

Contents

Testing 280 Discussion 281

Modeling with Stability Enforcement

282

Training without Constraints

AUTOMOTIVE EXAMPLES

283

Training with Constraints

284

Discussion 284

285

Sensor-Catalyst Modeling

285

Experimental Data Training and Testing

Engine Misfire Detection

287

287

289

Applying Recurrent Networks

SUMMARY AND CONCLUSIONS ACKNOWLEDGMENTS 292 REFERENCES 292

290

292

CHAPTER 8

Semiparametric Support Vector Machines for Nonlinear Model Estimation

ABSTRACT 295

INTRODUCTION 295 PROBLEM SET-FING 296 OPTIMUM LINE SEARCH FOR A GENERIC DIRECTION 297 SIMPLE ALGORITHMS FOR PURELY NONPARAMETRIC APPROACH A Gradient-Based Approach 299

A Coordinate Descent Approach

299

THE CLASSIC SVM PROBLEM 300 GENERAL SEMIPARAMETRIC SVM

301

The Augmented Lagrangian Approach

301

Training Semiparametric SVM

301

A Possible Simplification

302

REGULARIZED PARAMETRIC SVM APPROACH

302

EXPERIMENTAL RESULTS 302 CONCLUSIONS 305 ACKNOWLEDGMENTS 305 REFERENCES 305

 

CHAPTER 9

Gradient-Based Learning Applied to Document Recognition

306

295

298

INTRODUCTION 306 Learning from Data 307 Gradient-Based Learning

308

Gradient Back-Propogation

308

Learning in Real Handwriting Recognition Systems

309

Globally Trainable Systems

309

CONVOLUTIONAL NEURAL NETWORKS FOR ISOLATED CHARACTER RECOGNITION 310

Convolutional Networks LeNet-5 312

Loss Function

314

311

xi

Contents

RESULTS AND COMPARISON WITH OTHER METHODS

Database: the Modified NIST Set

Results

315

Comparison with Other Classifiers

314

316

Linear Classifier, and Pairwise Linear Classifier

316

314

Baseline Nearest Neighbor Classifier 317 Principal Component Analysis (PCA) and Polynomial Classifier

317

Radial Basis Function Network 318 One-Hidden Layer Fully Connected Multilayer Neural Network

318

Two-Hidden Layer Fully Connected, Multilayer Neural Network

318

A Small Convolutional Network: LeNet-1 LeNet-4 318 Boosted LeNet-4 318

Tangent Distance Classifier ( TDC)

319

318

Support Vector Machine (SVM) 319

Discussion 319 Invariance and Noise Resistance

320

MULTI-MODULE SYSTEMS AND GRAPH TRANSFORMER NETWORKS

An Object-Oriented Approach

321

Special Modules 322 Graph Transformer Networks 323 MULTIPLE OBJECT RECOGNITION: HEURISTIC OVER-SEGMENTATION

Segmentation Graph 324 Recognition Transformer and Viterbi Transformer

324

GLOBAL TRAINING FOR GRAPH TRANSFORMER NETWORKS

Viterbi Training 326 Discriminative Viterbi Training

327

Forward Scoring, and Forward Training

329

Discriminative Forward Training Remarks on Discriminative Training

330

331

326

321

324

MULTIPLE OBJECT RECOGNITION: SPACE DISPLACEMENT NEURAL

NETWORK

332

Interpreting the Output of an SDNN with a GTN

Experiments with SDNN 333 Global Training of SDNN 334 Object Detection and Spotting with SDNN

335

333

GRAPH TRANSFORMER NETWORKS AND TRANSDUCERS Previous Work 336 Standard Transduction 336 Generalized Transduction 336

Notes on the Graph Structures GTN and Hidden Markov Models

338

339

AN ON-LINE HANDWRITING RECOGNITION SYSTEM

Preprocessing

339

Network Architecture 340 Network Training 341 Experimental Results 341 A CHECK READING SYSTEM

342

A GTN for Check Amount Recognition Gradient-Based Learning 344

342

339

336

xii

CHAPTER 10

Contents

Rejecting Low Confidence Checks Results 344

344

CONCLUSIONS

345

APPENDICES

345

Pre-conditions for Faster Convergence

345

Stochastic Gradient vs. Batch Gradient

346

346

Stochastic Diagonal Levenberg-Marquardt ACKNOWLEDGMENTS 348

REFERENCES

348

Pattern Recognition Using A Family of Design Algorithms Based Upon Generalized

Probabilistic Descent Method

352

ABSTRACT 352 INTRODUCTION 353

 
 

354

Speech Pattern Recognition Using Modular Systems Classifier Design Based on Maximum-Likelihood Method

Classifier Design Based on Discriminant Function Approach

357

358

Motivation of GPD Development and Paper Organization

365

DISCRIMINATIVE PATTERN CLASSIFICATION

366

Bayes Decision Theory

 

366

Minimum Error Rate Classification

368

Discriminant Function Approach

370

Probabilistic Descent Method

 

375

Formalization 375 Probabilistic descent theorem

378

Problems

381

GENERALIZED PROBABILISTIC DESCENT METHOD

 

386

Formulation Concept 386 GPD Embodiment for Distance Classifier

386

Discriminant function for dynamic patterns Formulation 387

387

Design Optimality

390

Minimum Classification Error Learning 392

 

Speech Recognition Using MCE/GPD-trained Distance Classifiers Remarks 399 RELATIONSHIPS BETWEEN MCE/GPD AND OTHERS

400

396

Relation with LVQ

400

Relation with Maximization of Mutual Information 402

 

Relation

with Minimization of Squared Error Loss

403

DERIVATIVES OF GPD

 

405

Overview 405 Segmental-GPD for Continuous Speech Recognition 406 Minimum Error Training for Open-Vocabulary Recognition

410

Open-vocabulary speech recognition

410

411

415

 

Minimum spotting error learning

Discriminative utterance verification

Discriminative Feature Extraction

417

Fundamentals 417 An example implementation for spectrum-based speech recognition

420

xiii

Contents

Discriminative metric design

421

Minimum error learning subspace method

422

Speaker Recognition Using GPD 424

CONCLUDING REMARKS

426

Recent Progress of DFA-Based Speech Recognition

426

Advantages of GPD Formalization 428

 

Future Issues in GPD-based Pattern Recognition

 

430

ACKNOWLEDGMENTS

432

REFERENCES

 

433

APPENDIX

447

CHAPTER 11

An Approach to Adaptive Classification

 

455

ABSTRACT 455 INTRODUCTION 455 THE PROPOSED LEARNING METHOD Problem Statement 456

456

Model Inference and Updates

 

457

Restarting

458

The Viterbi Algorithm 459 Synopsis of Our Learning Method

 

459

Relation to Other Learning Method

459

EXAMPLES 460 Fixed Linear and Nonlinear Generating Function

 

460

Time-Varying Functions 461 CONCLUSIONS AND FUTURE DIRECTIONS REFERENCES 463

463

CHAPTER 12

Reduced-Rank Intelligent Signal Processing with Application

to Radar

465

INTRODUCTION 465 BACKGROUND 466 KARHUNEN-LOEVE ANALYSIS 467 The Karhunen-L6eve Transformation The Karhunen-L6eve Expansion 467

467

Implementing the KLT 468 THE MULTIPLE SIGNAL MODEL AND WIENER FILTERING 469 THE SIGNAL-DEPENDENT KLT FOR STATISTICAL SIGNAL PROCESSING The KLT and Principal-Components 470

470

The Cross-Spectral Metic: An Intelligent and Signal-Dependent KLT

471

INTELLIGENT SIGNAL REPRESENTATION FOR STATISTICAL SIGNAL

 

PROCESSING 472 A New Criterion for Signal Representation and Its Implementation

473

A Generalized Joint-Process KLT: Nonunitary Diagonalization of the Covariance Analysis of the JKLT 475 RADAR EXAMPLE 478 Radar Signal Processing 478

474

Estimation of the Statistics and Sample Support

Simulation

480

479

xiv

Contents

 

CONCLUSIONS

482

ACKNOWLEDGMENTS

482

REFERENCES

482

CHAPTER 13

Signal Detection in a Nonstationary Environment Reformulated as an Adaptive Pattern Classification Problem 484

 

ABSTRACT 484 INTRODUCTION 485

 

AN OVERVIEW OF NONSTATIONARY BEHAVIOR AND TIME-FREQUENCY

 

ANALYSIS

488

THEORETICAL BACKGROUND 492 Multiple Window Estimates 495

 

Spectrum Estimation

as an Inverse Problem

497

HIGH-RESOLUTION MULTIPLE-WINDOW SPECTROGRAMS 499

 

Non-Stationary Quadratic-Inverse Theory

501

 

Multiple Window Estimates

of the Loeve Spectrum

505

SPECTRUM ANALYSIS OF RADAR SIGNALS 507 MODULAR LEARNING MACHINE FOR ADAPTIVE SIGNAL DETECTION

517

How Does the Adaptive Receiver of Fig. 4 Respond to Nonstationary Environment CASE STUDY: RADAR TARGET DETECTION OF A SMALL TARGET IN SEA

521

CLUqTER

523

Details of the

Receiver

525

Detection Results

529

Robustness of the Detector 532

 

COST FUNCTIONS FOR SUPERVISED TRAINING OF THE PATTERN

 

CLASSIFIERS

532

SUMMARY AND DISCUSSION 535 ACKNOWLEDGMENTS 536 REFERENCES 536

 

CHAPTER 14

Data Representation Using Mixtures of Principal Components

541

ABSTRACT 541 INTRODUCTION 541 A SPECTRUM OF REPRESENTATIONS

Principal Components Vector Quantization

542

542

A Mixture of Principal Components

SUBSPACE PATTERN RECOGNITION

Similarity Measure Class Prototype Norm Invariance TRAINING 544

543

544

544

542

542

543

Topological Organization

546

GREY-SCALE FEATURE EXTRACTION

Training Data

547

Network Basis Vectors

Segmentation Results Illumination Variations

547

547

548

546

XV

Contents

COLOR IMAGE ANALYSIS Color Representation

548

548

Vector Angle versus Euclidean Distance

Edge Detection

551

Segmentation

551

CONCLUSION 551 REFERENCES 552

550

CHAPTER 15

Image Denoising by Sparse Code Shrinkage

554

 

ABSTRACT 554 INTRODUCTION 554

 

MAXIMUM LIKELIHOOD DENOISING OF NONGAUSSIAN RANDOM VARIABLES 555

Maximum Likelihood Denoising Modeling Sparse Densities

555

555

Laplace Density

555

Mildly Sparse Densities

556

Strongly Sparse Densities

557

Choice of Model

557

Some Other Models

557

FINDING THE SPARSE CODING TRANSFORMATION 558 MEAN-SQUARE ERROR APPROACH 558

Minimum Mean-Square Estimator in Scalar Case

558

Analysis of Mean-Square Error

 

559

Minimum Mean Squares Approach to Basis Estimation

 

559

SPARSE CODE SHRINKAGE 560 COMPARISON WITH WAVELET AND CORING METHODS 560 EXTENSIONS OFTHE BASIC THEORY 561

Nongaussian Noise

561

Estimation of Parameters from Noisy Data

561

Non-orthogonal Bases EXPERIMENTS

561

561

Generation of Image Data

561

Remarks on Image Data

562

Windowing The Local Mean

562

562

Normalizing the Local Variance

562

Transform Estimation

563

Methods

563

Results

563

Component Statistics

564

Denoising Results

565

CONCLUSION

566

REFERENCES

567

APPENDIX

567

INDEX

569

ABOUT THE EDITORS

573

xvi

Chapter 1 Humanistic Intelligence 'WearComp' as framework and application for intelligent processing a new signal Steve

Chapter 1

Humanistic Intelligence 'WearComp' as framework and application for intelligent processing

a

new

signal

Steve Mann

Abstract

Humanistic Intelligence (HI) is proposed a,s a new signal processing framework in which the processing appa.rat.us

is inextricably intertwined with the natural capabilities of our human body and mind. Rather than trying to emulate human intelligence, HI recognizes that the human brain is perhaps the best. neural network of its kind, and that there are many new signal processing applications, within the domain of personal cybernetics, that can make use of this excellent but often overlooked processor. The emphasis of this chapter is on personal imaging applications of HI, to take a first step toward an intelligent wearable camera system that can allow us to effortlessly capture our day-to-day experiences, help us remember and see better, provide us with personal safety through crime reduction, and facilitate new forms of communication through collective connected HI. The wearable signal processing hardware, which began as a. cumbersome backpack-based photographic apparatus of the 1970s, and evolved into a clothing-based appa.ratus in the early 1980s, currently provides the computational power of a UNIX workstation concealed within ordinary-looking

eyeglasses and clothing. Thus it may be worn cont.inuously during all facets of ordinary d~ through long-term adaptation, it begins to fl~nction as a true extension of the mind and body.

..

'-to-day

living; so that,

Keywords

Signals, Image processing,

Human factors,

Mobile c.ommunication, Machine

vision,

Photoquant.igraphic imaging,

Cybernetic sciences, Humanistic property protection, Consumer electronics

-~NHAT

I.

INTRODUCTION

is now proposed, is a. new form of "intelligence" whose goal is to not. only work in extremely

close svnergv with the human user, rather tha.n a.s a. sepa.ra.te entity, but more importa.ntly, to arise, in part, because of the very existence of the human user. This close synergy is a.chie~zed through a. user'-interface to signal processing ha.rdwa.re tha.t is both in close: physical proximity to the user, and

is constant.

The constancy of user-interface (interactional constancy) is what sepa.ra.t.es this signal processing architecture from other related devices such as pocket calculators and Personal Digital Assistants (PDAs).

Not only is the apparatus operationally constant, in the sense that a.lthough it may have power

saving (sleep) modes, it is never completely shut down

(dea.d a.s is typically

a. calculator

worn

in

a.

shirt pocket but turned off most of the time). More important is the fact that it is also interactionally consta.nt. By intera.ctionally constant, what is meant is that the inputs a.nd outputs of the device a.re

alwa.ys potentia.lly active.

Intera.ctionally constant implies opera.tionally constant, but opera.tiona.lly

constant does not necessarily imply interactionally consta.nt. Thus, for exanlple, a. pocket calculator,

worn in a. shirt pocket, and

left on

all

the

time

is still not interactiona.lly constant, because it cannot.

be used in this state (e.g. one still has to pull it out of the pocket to see the display or enter numbers). A wristwatch is a borderline case; a.lthough it operates consta.ntly in order to continue to keep proper time, and it is conveniently worn on the body, one must make a. conscious effort t:o orient, it within one's field of vision in order to interact with it.

S. MaKh is with the depart.ment, of Electrical and Comput.er Engineering, University of q.",,ronto, 10 King's College Road, S.F. 2001, Canada, M5S 3G4, E-mail: mann,6eecg,t.oront.o.edu htt.p://wearcomp.org . Special thanks t.o Kodak, Digital Equipment. Corporat.ion, Xybernaut., and ViA

A.

14'7~,yHumanistic Intelligence

it, is not, a,t first, obvious why one might want devices such as pocket calculators to be operationally constant. However, we will la,ter see why it, is desirable to have certain personal electronics devices, such as cameras a.nd signa,1 processing ha,rdware, be on constantly, for example, to fa.cilita.te new forms of intelligence tha,t a.ssist the user in new ways. Devices embodying HI a.re not merely intelligent, signal processors that a user might wear or carry in close proximity to the body, but instead, are devices that turn the user into part of an intelligent control system where the user becomes an integra,1 part of the feedback loop.

B.

Humanistic Intelligence does not necessarily mean

"user:friendly"

Devices embodying HI often require that the user learn a new skill set, and are therefore not necessarily easy to a.dapt to. Just as it takes a young child many years to become proficient at using his or her hands, some of the devices that implement HI have taken years of use before they began to truly behave as if they were natural extensions of the mind and body. Thus, in terms of Human-Computer Interaction [1], the goa,1 is not just to construct a. device that can model (and learn from) the user, but, more importa, ntly, to construct a device in which the user also must learn from the device. Therefore, in order to facilitate the latter, devices embodying HI should provide a constant user-interface

one tha.t is not so sophisticated a.nd intelligent that it confuses the user.

Although the device may

implement very sophisticated signa.1 processing a,lgorithms, the ca,use and effect relationship of this processing to its input (typically from the environment or the user's actions) should be clearly and continuously visible to the user, even when the user is not directly and intentionally interacting with the apparatus. Accordingly, the most successful examples of HI afford the user a, very tight feedback loop of system observability (ability to perceive how the signal processing hardware is responding to

the environment and the user), even when the controlla.bility of the device is not engaged (e.g. at times

when the user is not

issuing direct colnma, nds to the apparatus). A simple example is the viewfinder

of a wearable camera system, which provides framing, a photographic point of view, a,nd facilitates the provision to the user of a general awareness of the visual effects of the camera's own image processing

a,lgorithms, even when pictures are not being taken. Thus a camera embodying HI puts the human operator in the feedback loop of the imaging process, even when the opera, tor only wishes to take pictures occasionally. A more sophisticated example of HI is a biofeedback-controlled wearable camera. system, in which the biofeedback process happens continuously, whether or not a picture is actually being ta.ken. In this sense, the user becomes one with the machine, over a long period of time, even if the machine is only directly used (e.g. to actually take a picture) occasionally. Humanistic Intelligence a.ttempts to both build upon, as well as re-contextualize, concepts in intel- ligent signal processing [2][3], and related concepts such as neural networks [2][4][5], fuzzy logic [6][7], and artificial intelligence [8]. Humanistic Intelligence also suggests a new goal for signal processing hardware, that is, in a truly personal way, to directly assist, rather than repla, ce or emulate human intelligence. What is needed to facilitate this vision is a simple and truly personal computational sig-

na.1 processing framework that empowers

the huma, n intellect. It should be noted tha.t this framework

which arose in the 1970s and early 1980s is in many ways similar to Engelbart's vision that arose in the 1940s while he wa,s a ra,dar engineer, but that there a.re also some importa.nt differences. Engelbart, while seeing images on a. radar screen, envisioned that the cathode ray screen could a,lso display letters of the alphabet, a.s well as computer generated pictures and graphical content, and thus envisioned

computing a.s a.n interactive experience for manipulating words and pictures. Engelbart envisioned the ma.infra, lne computer as a tool for a,ugmented intelligence and a,ugmented communication, in which a number of people in a large amphitheatre could interact with one another using a. large mainframe computer[9] [10]. While Engelbart himself did not realize the significance of the persona,1 computer, his ideas are cer- tainly embodied in modern personal computing. What is now described is a. means of rea.lizing a, similar

Computer

~

(a)

_

Comlm er.r~--~

~]

(b)

@

Computer

(c)

h|pul

!

Oul~-ut

............................

(d)

Fig. 1. The three basic operational modes of WearComp. (a.) Signal flow paths for a computer system that runs continuously, constantly attentive to the user's input, and constantly providing infonna.tion to the user. Over time, constancy leads to a symbiosis in which the user and computer become part of each other's feedback loops. (b) Signal flow path for augmented intelligence and auglnented reality. Interaction with the computer is secondary to another primary activity, such as walking, attending a meeting, or perhaps doing something that requires full hand-to eye coordination, like running down stairs or playing volleyball. Because the other prima.ry activity is

often one that requires the human to be attentive to the environment as well as unencumbered, the computer must be able to operate in the background to augment the primary experience, for example, by providing a map of a building interior, or providing other information, through the use of computer graphics overlays superimposed on

top of the rea.1 world. (c) WearColnp can be used like clothing,

to encapsulate the user and function as a protective

shell, whether to protect us from cold, protect us froln physical attack (as traditionally facilitated by armour), or to provide privacy (by concealing personal information and personal attributes from others). In terms of signal flow,

this encapsulation facilitates the possible mediation of incoming information to permit solitude, and the possible

mediation of outgoing information to permit privacy. It is not so much the absolute blocking of these information

channels that is important; it is the fact that the wearer can control to what extent,

and when, these channels are

blocked, modified, attenuated, or amplified, in various degrees, that makes WearComp much more empowering to the user than other similar forms of port.able computing. (d) An equivalent depiction of encapsulation (mediation) redrawn to give it. a. similar form to that of (a) and (b), where the encapsulation is understood to comprise a. separate

protective shell.

vision, but with the computing re-situated in a, different context, namely the truly personal space of the user. The idea here is to move the tools of augmented intelligence and augmented communication directly onto the body, giving rise to not only a, new genre of truly personal computing, but to some new capabilities and affordances arising from direct physical contact between the computa, tional apparatus and the human body. Moreover, a new family of applications arises, such as "personal imaging", in which the body-worn apparatus facilitates an augmenting of the human sensory capabilities, namely vision. Thus the a.ugmenting of human memory translates directly to a visual associative memory in which the appara.tus might, for example, play previously recorded video back into the wearer's eyeglass mounted display, in the manner of a so-called visual thesaurus[l 1].

 

II.

'WEARCOMP'

AS MEANS OF REALIZING HUMANISTIC

INTELLIGENCE

 
 

WearComp

[12] is

now

proposed

as

an

apparatus

upon

which

a practicM

realiza.tion

of

HI

can

be

built,

as

well as

a research

tool for new studies

in intelligent signal processing.

 

A. Basic principles of WearComp

 
 

Wea, rComp

will now be defined

in terms

of its

three

basic

modes of opera, tion.

 

A.1

Operational

modes of WearComp

 
 

The

three

operational

modes

in

this

new

interaction

between

human

and

computer,

as illustrated

in

Fig

1 are:

 

• Constancy:

The computer

runs continuously,

and is "always rea, dy"

to interact

with the user.

Unlike

up

on prior

a hand-held

device, laptop

computer,

or PDA,

it does

not

need

to

be opened

a.nd turned

to

use.

The

signa.1 flow from human

to computer,

and computer

to huma.n,

depicted

in

Fig

l(a)

runs

continuously

to provide a consta, nt

user-interface.

 

• Augmentation: Traditiona.1 computing paradigms are based on the notion that computing is the primary ta, sk. VV~arComp, however, is ba,sed on the notion that computing is NOT the primary task. The a,ssulnption of VV'earComp is that the user will be doing something else at the same time a.s doing the computing. Thus the computer should serve to augment the intellect, or augment the senses. The signa.1 flow between huma.n a,nd computer, in the a,ugmentational mode of opera.tion, is depicted in Fig 1(b). • Mediation: Unlike hand held devices, la,ptop computers, and PDAs, WearComp can encapsulate the user (.Fig l(c)). It doesn't necessarily need to completely enclose us, but the basic concept of mediation a.llows for whatever degree of encapsulation might be desired, since it affords us the possibility of a greater degree of encapsula, tion than traditional portable computers. Moreover, there are two aspects

to this enca, psulation, one or

both of which may be implemented in varying degrees, as desired:

  • - Solitude: The a,bility of WearComp to media.re our perception can allow it to function as an

information filter, and allow us to block out material we might not wish to experience, whether it be offensive advertising, or simply a desire to repla, ce existing media with different media,. In less extreme ma.nifestations, it may simply allow us to alter aspects of our perception of reality in a moderate way

rather than completely blocking out certain material. Moreover, in addition to providing means for blocking or a.ttenua.tion of undesired input, there is a, facility to amplify or enhance desired inputs.

This control over the input space is one of the important contributors to the most fundamental issue

in this new framework, ~.namely that

of user empowerment.

  • - Privacy: Media.tion allows us to block or modify information leaving our encapsulated spa.ce. In

the same way that ordinary clothing prevents others from seeing our naked bodies, WearComp may, for example, serve as an intermediary for interacting with untrusted systems, such as third party implementations of digital a,nonymous cash, or other electronic transactions with untrusted parties. In the same way that martial artists, especially stick fighters, wear a long black robe that comes right down to the ground, in order to hide the placement of their feet from their opponent, WearComp can also be used to clothe our otherwise transparent movements in cyberspace. Although other technologies, like desktop computers, can, to a limited degree, help us protect our privacy with programs like Pretty Good Privacy (PGP), the primary weakness of these systems is the space between them and their user. It is generally far easier for an attacker to compromise the link between the hulna.n and the computer (perha, ps through a so-called Trojan horse or other planted virus) when they are separate entities. Thus a personal information system owned, operated, and controlled by the wearer, can be used to create a new level of personal privacy because it can be made much more personal, e.g. so that it is always worn, except perhaps during showering, and therefore less likely to fall prey to attacks upon the hardware itself. Moreover, the close synergy between the human and computers makes it harder to attack directly, e.g. as one might look over a person's shoulder while they are typing, or hide a video camera in the ceiling above their keyboard 1. Because of its a,bility to encapsulate us, e.g. in embodinlents of WearComp that are actually articles of clothing in direct contact with our flesh, it may a.lso be able to make mea,surements of various physiological quantities. Thus the signal flow depicted in Fig l(a) is also enhanced by the encapsula.tion as depicted in Fig l(c). To make this signa.1 flow more explicit, Fig l(c) ha.s been redrawn, in Fig l(d), where the computer a.nd human a.re depicted as two sepa,rate entities within an optional protective shell, which ma.y be opened or partially opened if a, mixture of augmented and mediated interaction is

desired. Note that these three ha,sic modes of operation a.re not mutually exclusive in the sense that the first is embodied in both of the other two. These other two are also not necessarily meant to be implemented in isolation. Actual embodiments of WearComp typically incorporate aspects of both a.ugmented a,nd

 

IFor

the

purposes

of this

paper,

privacy

is

not

so

much

the

absolute

Mocking

or conceahnent

of personal

information,

but

it

it

the

ability

to control

or

modulate

this

outbound

informat, ion

channel.

Thus,

for example,

one

may

wish

certain

people,

such

a.s members

of one's

immediate

family,

to

have

greater

access

to

personal

information

than

the

general

public.

Such

a

family-area-network

may I~ implemented

with

an appropriate

access control list and

a cryptographic

communications

protocol.

t INM(

)N( )P( )LIZI N( ;

...

t INRESTRI( ~FIVE

ATTENTIVE

:

i

C°mputerI

n

z

r-"

>

.

:~',

( "( )MM(

JNI( "ATI V E

.......

.

.........................

,

Fig. 2.

The six signal flow paths for t,he new mode of human-computer interaction provided by WearComp. These six

signal flow paths each define one of the six a.ttributes of WearComp.

mediated modes of operation. Thus WearComp is a framework for enabling and combining various aspects of each of these three basic modes of operation. Collectively, the space of possible signa.1 flows giving rise to this entire space of possibilities, is depicted in Fig 2. The signa.1 paths typically comprise vector quantities. Thus multiple pa.rallel signal paths axe depicted ill this figure to remind the reader of this vector nature of the signals.

  • B. The six basic signal rio'u, path.s of H,fea rComp

There are six informational flow paths associated with this new huma.n-machine symbiosis. These signal flow paths each define one of the ba.sic underlying principles of WearComp, and are each de- scribed, in what follows, from the human's point of view. Implicit in these six properties is that the computer system is also operationa, lly constant and personal (inextricably intertwined with the user). The six basic properties are:

1. UNMONOPOLIZING of the user's attention" it does not necessarily cut one off from the

outside world like a virtual reality game or the like does. One can attend

to other matters while using

the apparatus. It is built with the assumption that computing will be a secondary activity, ra.ther

than a primary focus of attention.

In fact, ideally, it will provide enhanced sensory capabilities. It

may, however, facilitate mediation (augmenting, altering, or deliberately diminishing) these sensory capabilities.

  • 2. UNRESTRICTIVE to the user: ambulatory, mobile, roving, one can do other things while

using it, e.g. one can type while jogging, running down stairs, etc.

  • 3. OBSERVABLE by the user: It can get the user's a,ttention continuously if the user wants it to.

The output medium is constantly perceptible by the wearer. It is sufficient that it be Mmost-a.lwa,ys-

observa, ble, within reasonable limitations such a.s the fact tha, t a camera viewfinder or computer screen is not visible during the blinking of the eyes.

  • 4. CONTROLLABLE by the user: Responsive. The user can take control of it a,t a,ny time the

user wishes. Even in a,utomated processes the user should be able to ma.nually override the a utoma.tion

to break open the control loop and become pa, rt of the loop a,t any time the user wants to. Examples of" this controllability might include a "'Halt" button the user can invoke as an a,pplica.tion mindlessly opens all ,50 documents that were highlighted when the user a,ccidently pressed "Enter"

  • 5. ATTENTIVE to the environment" Environmenta.lly aware, multimodal, multisensory. (As a

result this ultimately gives the user increased situational awareness).

  • 6. COMMUNICATIVE

to others: WearComp can be used as a communications medium when the

user wishes. Expressive: WearComp allows the wearer to be expressive through the medium, whether as a direct communications medium to others, or as means of assisting the user in the production of

expressive or communicative media.

III.

PHILOSOPHICAL

ISSUES

There are ma, ny open questions in this new area of research. Some of these include the following:

• Is \,~'~ar(~',omp good? •

Do we need it? • Could it be ha, rmful to the user? • Could it be harmful to society? •

Are humans prepared

for such a, close synergy with ma, chines, or will nature

"strike back"?

• Will the a,ppa.ratus modify the behaviour of the wea.rer in an undesirable way?

Will

it become

a, ca,use of irritation

to others?

As with many new inventions, such a.s clothing, the bicycle, hot air balloons, etc., there ha, s been an

initia.1 rejection,

followed

by scientific

study

and

experilnenta, tion,

followed by either

acceptance or

modifica.tion of the invention to address specific problems.

 

For example, the adverse effects of the bright screen constantly projected onto the eye were addressed

by building the appa.ratus

into very dark

sunglasses so that

a. much lower brightness could be used.

More importantly, is perhaps the da.nger of becoming dependent on the technology in the same way

that we become dependent on shoes, clothing, the automobile, etc

..

For example,

the fact

that

we

ca.nnot survive na,ked in the wilderness, or that we have become sedentary beca,use of the automobile, must have its equivalent problems within the context of the proposed invention. Ma.ny of these issues are open philosophical questions that will only be answered by further research.

However. the specific fra, lnework a.nd some of the various ideas surrounding ha,sis for a further investiga.tion of some of these questions.

it, will hopefully form the

A.

Fundamental iss,tes of WearComp

The most fundalnental paradigm shift that \~V~a.rComp has to offer is that of personal empowerment. In order to fully appreciate the magnitude of this paradigm shift, some historical examples of tools of empowerment will now be described to place WearComp in this historical context.

A.1 Historical context In early civilization, individua, ls were

all roughly equal, militarily.

Wealth wa.s generally determined

by how many hea,d of cattle, or how ma, ny "mounts" (horses) a person owned. In hand-to-hand

comba.t, fighting with swords, each individual wa,s roughly an equal.

Since it wa,s impossible to stay on

a. horse while fighting, horses provided little in the way of military power, so tha, t even those too poor to afford to keep a, horse were not a.t a tremendous disadvantage to others from a fighting standpoint. It wa,s the invention of the stirrup, however, tha, t radically changed this bala, nce. With the stirrup,

it became possible to stay on a horse while fighting. Horses and heavy armour could only be afforded 1)3," the wea, lthy, a.nd even a. la,rge group of unruly pea,sa,nts wa.s no match for a, much sma.ller group of mounted cava.lry. However, toward the middle ages, more and more ordina.ry individuals mastered the art of fighting on horseba~ck, a.nd eventually the pla.ying field leveled out. Then, with the invention of gunpowder, the ordinary civilia, n was powerless against soldiers or bandits a.rmed with guns. It was not until guns becan~e cheap enough that everyone could own one

a.s

in

the

"old

west".

The

Colt

4.5, for

example, wa,s known as the "equalizer" because it ma, de

everyone roughly equal. Even if one person wa.s much more skilled in its use, there would still be some

risk involved in robbing other civilians or looting someone's home.

A.2 The shift from guns to camera, s a,nd computers

In today's world, the hand gun has a lesser role to play. Wars are fought with informa, tion, a.nd we

live

in

a world

in which

the

appearance

of thugs

and bandits

is

not

ubiquitous.

While

there is some

crime, we spend

most

of our lives living in relative peace.

However,

surveillance and ma.ss media ha,ve

become the new instruments of socia,1 control.

Department

stores are protected

with security cameras

6

rather tha.n by owners keeping a. shotgun under the counter or hiring armed guards to provide a. visible deterrent. While some depa.rtment stores in rough neighbourhoods may have a.rmed guards, there has been. a. paradigm shift where we see less guns and more surveillallce cameras.

A.3 The shift from draconia, n punishment to micro mallagement

There has also been a pa, ra, digm shift, throughout tile ages, cha.ra.cterized 1)3.," a, move toward less severe punishinents, inflicted with grea, ter certa.inty. In the middle ages, the lack of sophistica, ted

surveilla.nce and COlnmunica.tions networks

mea.nt that crimina, ls often escaped detection or capture,

but when ttley were ca,ptured, punishments were extremely severe. Gruesome corporeal punishments where criminals might be crucified, or whipped, bra.nded, drawn a,nd quartered, a,nd then burned a,t

the stake, were quite common in these times. The evolution froln punishment a,s a, spectacle in which people were tortured to death in the village

squa, re, towa, rd inca.rcera, tion in which

people were locked in a, cell, and forced to a.ttend church sermons,

prison lectures, etc., marked the first step in a. paradigm shift, toward less severe punishments[13].

Combined q " e

with

improved forensic technologies

like fingerprinting .~ this

reduction

in

the

severity of

punishment ca.me together with a. greater cha,ilce of getting caught. More recently, with the a.dvent of so-called "boot camp", where delinquent youths a,re sent off for

mandatory

style

training, the trend coi~tinues 1)3: addressing social problems ea.rlier before

they become la.rge prot)lems. This requires greater surveillance and monitoring, but at the same time

is characterized by less severe a,ctions taken aga.inst ttlose who a,re deemed to require these actions. Thus there is, again, still greater chance of being affected by smaller punishments.

If we extra.polate this trend, what we

arrive

at is a, system of socia,1 control characterized 1)3,: total

surveillance and micro-punishments. At some point, the forces applied to the subjects of the social control are too weak to even justify the use of the word '~punishment", and perhaps it might be better referred to as "micro management". This "micro management" of society may be effected by subjecting the population to ma.ss media, advertising, and calming music played in department stores, elevators, and subway sta.tions. Surveillance is a,lso spreading into area, s tha, t were genera.lly private in earlier times. The surveilla.nce camera, s that were placed in ba, nks ha,ve moved to department stores. They," first a,ppeared above cash registers to deal with major crimes like holdups. But then they moved into the aisles and spread throughout the store to deal with petty theft. Again, more surveillance for dea, ling with lesser crimes. In the U.K., cameras installed for controlling crime in rough a,reas of town spread to low crime areas as well, in order to deal with problems like youths stealing apples from street ma.rkets, or pa, trons of pubs urinating on the street. The camera, s ha.ve even spread into restaurants and pubs not just above the cash register, but throughout the pub, so that going out for pints, one may no longer ha.re privacy. Recently, electronic plumbing technology, originally developed for use in prisons, for example, to prevent all inma, tes from fluslling the toilets simulta.neously, has started to be used in public buildings. The arguments in favor of it go beyond human hygiene and water conservation, as proponents of the technology argue that it also reduces vandalism. Their definition of vandalism he~s been broadened to include delibera, tely flooding a plumbing fixture, and deliberately leaving faucets running. Thus, again, wha, t we see is greater certa, inty of catching or preventing people from committillg lesser transgressions of the social order. One pa,rticula,rly subtle form of social control using this technolog3,', is the new hands free electronic showers developed for use in prisons where inmates would otherwise break off knobs, levers, and pushbuttons. These showers are just beginning to appear in government buildings, sta, diums, health clubs, and schools. The machine watches the user, from behind a, tiled wa,ll, through a small dark glass window. When the user steps toward the shower, the water comes on, but only for a, certain time,

and then it shuts off. Obviously the user can step away from the viewing window, and then return, to

receive more wa,ter, and thus defea,t the timeout fea.ture of the system, but this need to step away and

move back into view

is enough of a,n irrita, nt a.s to effect, a. slight behavioura.1 modifica.tion of the user.

Thus what we see is tha.t surveilla, nce ha,s swept across all facets of society, but is being used to deal

with smaller and sma,ller problems. From dealing with ma, ss murderers and ba.nk robbers, to people

who t.hreaten the environment by taking long showers, the long arm of surveillance ha, s reached into even the most private of pla, ces, where we might have once been alone. The pea, ce and solitude of the

shower, where our grea.test inspirations might come

to

us,

has

been intruded

upon

with not

a. major

punishment, but a, very minor form of social control, too small in fact to even be called a punishment. These surveilla.nce a.nd social control systems a,re linked together, often to central computer systems. Everything from surveillance cameras in the ba.nk, to electronic plumbing networks is being equipped with fiber optic communications networks. Together with the va,st array of medical records, credit ca,rd purcha.ses, buying preferences, etc., we are affected in more ways, but with lesser influence. We are no longer held a,t ba.y by mounted cavalry. More often than being influenced by weapons, we are influenced in very slight, almost imperceptible ways, for example, through a deluge of junk mail, marketing, advertising, or a shower that shuts off after it sees that we've been standing under it for

too long. While there are some (the most nota, ble being Jeremy Bentham[13]) who have put forth a.n ~rgument that a carefully managed society results in maximization of happiness, there are others who argue

that

the homogeniza, tion of society," is unhealthy, and reduces humans to cogs in a, la,rger piece of

machinery , or a.t the very lea,st, results in a certain loss of human dignity. Moreover, just a,s nature provides biodiversity, many believe that societyshould a,lso be diverse, a.nd people should try to

resist ubiquitous centralized surveilla,nce a,nd control, particularly to the extent where it homogenizes society excessively. Some argue tha, t micro-managelnent and utilitarianism, in which a person's value

may often be mea.sured in terms of usefulness to society, is wha, t led to eugenics, and eventually to

the fascism of Nazi Germany. Ma,ny people a:lso agree tha, t, even without

a.n3.. ..

sort of social control

mechanisnl, surveillance, in a,nd of itself, still

violates their privacy, and is fundan~entall3

....

wrong.

As with other technologies, like the stirrup a.nd gunpowder, the electronic surveillance playing field is also being leveled. The advent of the low-cost personal computer has allowed individuals to com- ,nunicate freely a,nd ea,sily among themselves. No longer a.re the major media conglomera,tes the sole

voice heard in our homes.

The World Wide Web has

ushered in a, new

era. of underground news and

alterna.tive content. Thus centralized computing facilities, the very technology tha,t many perceived

as a. threat to huma, n individuality and freedom, have given way to low cost personal computers that

many people ca,n afford. This is not to say that home computers will be as

big or powerful a.s the larger

computers used by large corpora, tions or governments, but simply that if a large number of people have a modera, te degree of computationa,1 resources, there is a sense of balance in which people a,re roughly equa,1 in the same sense that two people, face to face, one with a. 0.22 calibre handgun and the other with a. Colt 0.45 a.re roughly equal. A large bullet hole or a sma,ll one, both provide a ta, ngible and

real risk of death

or injury.

It

is perha, ps

modern

cryptogra.phy

that

makes

this

balance

even

more

pronounced,

for

it

is

so

ma, ny orders of magnitude ea.sier to encrypt a message than it is to decrypt it

Accordingly, many

.. governments have defined cryptography a,s a. munition and attempted, with only limited success, to

restrict its use.

A.4 Fundamental issues of \A,;~a.r(i',OlnI) Tile most fundamental issue in Vv"earC.onlp is no doubt that of persona,1 empowerment, through its

a.bility to equip the individua,1 with a personalized, customiza, ble information space, owned, operated, and controlled by the wea,rer. While home computers have gone a long way to empowering the

individual,

they

only do

so when

the

user

is

a,t home.

As

the

home

is perha, ps

the

last

bastion

of

space not yet touched by the long arm of surveillance

space that

one ca.n call one's own,

the home

computer, while it does provide an increase in personal empowerxnent, is not nearly so profound in its

effect a.s the WearC, omp

which brings this personal space

space one can ca.ll one's own

out into

the world. Although WearComp, in the most common form we know it toda.y (miniature video screen over one or both eyes, body worn processor, and input devices such a,s a collection of pushbutton switches or joystick held in oile ha.nd a.nd a. microphone) was invented in the 1970s for personal imaging applica,- tions, it has more recently been adopted by the military in the context of large government-funded projects. However, as with the stirrup, gunpowder, a.nd other similar inventions, it is already making its way out into the mainstream consumer electronics arena. An important observation to make, with regards to the continued innovation, early adopters (mil- itary, government, large multinational corpora.tions), and finally ubiquity, is the time scale. While it took a relatively longer time for the masses to adopt the use of horses for fighting, and hence level the playing field, later, the use of gunpowder became ubiquitous in a much shorter time period. Then, sometime after guns had been adopted by the masses, the spread of computer technology, which in some situations even replaced guns, was so much faster still. As the technology diffuses into society more quickly, the military is losing its advantage over ordinary civilians. We are entering a pivotal era in which consumer electronics is surpassing the technological sophistication of some military electronics. Personal audio systems like the SONY Walkman are just one example of how the ubiquity and sophistication of technology feed upon each other to the extent that the technology begins to rival, and in some ways, exceed, the technical sophistication of the limited-production milita.ry counterparts such as two-way radios used in the battlefield. Consumer technology has already brought about a certain degree of personal empowerment, from the portable cassette player that lets us replace the music piped into department stores with whatever we would rather hear, to small hand held cameras that capture police brutality and human rights violations. However, WearComp is just beginning to bring about a much greater para.digm shift, which may well be equivalent in its impact to the invention of the stirrup, or that of gunpowder. Moreover, this leveling of the playing field may, for the first time in history, happen almost instantaneously, should the major consumer electronics manufacturers beat the military to raising this invention to a level of perfection similar to that of the stirrup or modern handguns. If this were to happen, this decreasing of the time scale over which technology diffuses through society will have decreased to zero, resulting in a new kind of paradigm shift that society has not yet experienced. Evidence of this pivotal shift is alreadyvisible, in, for example, the joint effort of Xybernaut Corp. (a major manufacturer of wearable computers) and SONY Corp. (a manufacturer of personal electronics) to create a new personal electronics computational device.

  • B. Aspects of WearComp and personal empowerment

There are several aspects and affordances of WearComp.

These are:

• Photographic/videographic

memory:

visual information

m mo,'y[

4]).

Perfect recall of previously~collected inforInation,

especially

• Shared memory: In a collective sense, two or more individuals may share in their collective conscious- ness, so that one may have a recall of information that one need not have experienced personally. • Connected collective humanistic intelligence: In a collective sense, two or more individuals may collaborate while one or more of them is doing another primary task. • Personal safety" In contrast to a centralized surveillance network built into the architecture of the city, a personal safety system is built into the architecture (clothing) of the individual. This framework has the potential to lead to a distributed "intelligence" system of sorts, as opposed to the centralized "intelligence" gathering efforts of traditional video surveillance networks.

• Tetherless opera, tion: \¥e,a,rColnp affords a.nd requires mobility, and the freedom from the

connected by wire to an

electrical outlet, or communica, tions line.

need to

be

• Synergy: Rather tha, n a,ttempting to emula.te hulna, n intelligence in the computer, as is a. common goa,1 of research in Artificial Intelligence (AI), the goal of WearComp is to produce a synergistic combination of human and 1ha.chine, in which the hulna.n performs tasks that it is better a.t, while the computer performs tasks that it. is better at. Over an extended period of time, VV~a,rComp begins to function as a, true extension of the mind a.nd body, a.nd no longer feels as if it is a. separate entity. In

fa.ct, the user will often adapt to the a,ppa.ratus to such a. degree, that when taking it off, its a.bsence will feel uncolnforta, ble, in the same wa,y" that we adapt to shoes and clothing to such a. degree that being without theln most of us would feel extremely uncomfortable (whether in a public setting, or in a,n environment in which we have come to be accustomed to the protection that shoes and clothing provide). This intimate and constant bonding is such that the combined capability resulting in a synergistic whole far exceeds the sum of its components. • Quality of life: WearComp is capable of enhancing day-to-day experiences, not just in the workplace, but in all facets of daily life. It has the capability to enhance the overall quality of life for many people.

IV.

PRACTICAL

EMBODIMENTS

OF

WEARCOMP

The ~earComp apparatus consists of a, ba, tterv-powered wearable Internet-connected [15] computer system with minia.ture eyeglass-mounted screen and appropriate optics to form the virtual image equiva.lent to a,n ordinary desktop multimedia computer. However, because the apparatus is tetherless, it travels with the user, presenting a computer screen that either appears superimposed on top of the real world, or represents the rea,1 world a,s a video image[16]. Due to advances in low power microelectronics [17], we a,re entering a pivotal era in which it will become possible for us to be inextricably intertwined with computa.tiona.1 technology that will become part of our everyday lives in a. much more immediate and intimate way tha,n in the past. Physical proximity and consta, ncy were simultaneously rea,lized by the 'WearComp' project 2 of the 1970s and early 1980s (Fig 3) which was a first attempt at building an intelligent "photographer's assistant" around the body, and comprised a computer systeln attached to the body, a display means constantly visible to one or both eyes, and means of signal input including a series of pushbutton switches and a. pointing device (Fig 4) that the wearer could hold in one hand to function as a keyboard and mouse do, but still be able to operate the device while walking around. In this way, the apparatus re-situated the functionality of a desktop multimedia computer with mouse, keyboard, and video screen, as a physical extension of the user's body. While the size and weight reductions of WearComp over the last 20 years, have been quite dramatic, the basic qualitative elements and functionality have remained essentially the same, apart from the obvious increase in computational power. However, what makes Wea, rComp particularly useful in new and interesting ways, and what makes it particularly suitable a,s a ha,sis for HI, is the collection of other input devices, not all of which are found on a desktop multimedia' computer. In typical embodiments of 'WearComp' these mea.surement (input) devices include the following:

"

t

,J

• ultra-minia.ture cameras concealed inside eyeglasses and oriented to have the same field of view a.s the wearer, thus providing the computer with the wearer's "first-person" perspective. • one or more additiona.l cameras that afford alternate points of view (e.g. a rear-looking ca,hera with a view of what is directly behind the wea,rer). • ~ts of microphones, typically comprising one set. to capture the sounds of someone talking to the wearer (typically a linear array across the top of the wearer's eyeglasses), and a second set to capture

the wearer's own speech.

~For a detailed historical account, of the Wearcomp

project, and other related projects, see [18][19].

10

(a,) (b)
(a,)
(b)

Fig. 3. Early emoodiments of the author's original "photographer's assistant" application of Personal Imaging. (a) Author wearing WearComp2, an early 1980s backpack-based signal processing and personal imaging system with right eye display. Two antennas operating at different fi'equencies facilitated wireless communications over a full- duplex radio link. (b) WearComp4, a late 1980s clothing-based signal processing and personal imaging system with left eye display and beam splitter. Separate antennas facilitated simultaneous voice, video, and data communication.

(a,) (b) Fig. 3. Early emoodiments of the author's original "photographer's assistant" application of Personal Imaging.

(b)

Fig. 4. Author using some early input devices ("keyboards" and "mice") for WearComp: (a.) 1970s: input device comprising pushbutton switches mounted to a wooden hand-grip (b) 1980s: input device comprising microswitches mounted to the handle of an electronic flash. These devices also incorporated a detachable joystick (controlling two potentiometers), designed as a pointing device for use in conjunction with the Wea.rComp project.

11

• biosensors, comprising not just heart ra, te but full ECG waveform, as well as respiration, skin con- ductivity, sweat level, and other quantities [20] ea.ch a.va,ila.ble as a continuous (sufficiently sampled) time-va,rying voltage. Typically these a,re connected to the wea,ra, ble central processing unit through an eight-cha.nnel a.na,log to digital converter. • footstep sensors typically comprising a,n array of tra.nsducers inside ea.ch shoe. • wea.rable ra.da.r systems in the form of antenna, a,rra.ys sewn into clothing. These typica.lly opera.te in the 24.36GHz range. The la.st three, in particular, are not found on standa, rd desktop computers, and even the first three, which often a.re found on sta, ndard desktop computers, appear in a, different context here than they

do

on

a. desktop

computer.

For example, in Wea,rColnp,

the camera, does not

show an

image of the

user, a.s it does typically on a desktop computer, but, ra, ther, it provides informa.tion about the user's environment. Furthermore, the general philosophy, as will be described in Sections V and VI, will be to regard all of the input devices as measurement devices. Even something as simple as a. camera will be regarded as a measuring instrument, within the signal processing framework.

Certain applications use only a subset of these devices, but including all of then: in the design facili- tates rapid prototyping and experimentation with new applications. Most embodiments of WearComp are modular, so tha.t devices can be removed when they are not being used.

A side-effect

of this

'WearComp'

apparatus

is

tha, t

it

replaces

much

of the personal electronics

tha, t we carry in our day-to-day living. It enables us to interact with others through its wireless

data. communications link, and therefore replaces the pager and cellular telephone. It allows us to perform basic computations, and thus replaces the pocket calculator, laptop computer and personal data assistant (PDA). It can record da.ta from its many inputs, and therefore it replaces a.nd subsumes the portable dictating machine, camcorder, and the photographic camera. And it can reproduce ("play ha.ok") audiovisual data, so that it subsumes.the portable audio cassette player. It keeps time, a,s any computer does, and this may be displayed when desired, rendering a, wristwatch obsolete. (A calenda.r program which produces audible, vibrotactile, or other output also renders the alarm clock obsolete.)

However, it goes beyond replacing all of these items, beca,use not only is it currently far smaller and

far less obtrusive than the

sum of what it replaces, but these functions are interwoven seamlessly, so

that they work together in a, mutually a.ssistive fashion. Furthermore, entirely new functiona.lities, and new forms of intera.ction arise, such as enha,nced sensory capabilities, as will be discussed in Sections V a.nd VI.

  • A. Building signal-processing devices directly into fabric

The wearable signal processing apparatus of the 1970s and ea.rly 1980s was cumbersome at best, so an effort was directed toward not only reducing its size and weight, but, more importantly, reducing its undesirable and somewhat obtrusive appearance. Moreover, an effort was also directed at making an apparatus of a given size and weight more comfortable to wear and bearable to the user [12], through bringing components in closer proximity: to the body, thereby reducing torques a.nd moments of inertia,. Sta.rting in 1982, Eleveld and Ma,nn[19] began an effort to build circuitry directly into clothing. The term 'sma, rt clothing' refers to variations of \¥earCoxnp that are built directly into clothing, and a.re characterized by (or at least a,n attempt at) making components distributed rather than lumped, whenever possible or practical.

It was found [19] that the same a.ppa.ratus could be made much more

comforta, ble by bringing the

components closer to the body which ha.d the effect of reducing both the torque felt bearing the load, as well a,s the moment of inertia, felt in moving around. This effort resulted in a version of WearComp called the 'Underwea, rable Computer' [19] shown in Fig 5.

Typical embodiments of the underweara,ble

resemble a,n athletic

undershirt

(.tank

top.)

made

of

d~a, ble mesh fabric, upon which a. lattice of webbing is sewn. This facilitates quick reconfigura.tion in the layout of components, a,nd re-routing of cabling. Note that wire ties are not needed to fix cabling,

12

(b) Fig. 5. The 'underwearable' signal processing hardware" (a) as worn by author (stripped to the

(b)

Fig. 5.

The 'underwearable' signal processing hardware" (a) as worn by author (stripped to the undershirt., which

is normally covered by a sweater or jacket); routing of cabling.

(b) close up of underwearable signal processor, showing webbing for

as it is simply run through the webbing, which holds it in place. All power and signal connections are standardized, so that devices may be installed or removed without the use of any tools (such as soldering iron) by simply removing the garment and spreading it out on a flat surface. Some more recent related work by others [21], also involves building circuits into clothing, in which

a, garment is

constructed as a, monitoring device to determine the location of a. bullet entry. The

underwearable differs from this monitoring apparatus ill the sense that the underweara.ble is tot.ally reconfigurable in the field, and also in the sense that it embodies HI (the apparatus reported ill [21] performs a inonitoring function but does not facilita.te human interaction).

In summary, there were three reasons for the signal processing hardware being "underwearable':

• By both distributing the components throughout the garment, and by keeping the components in

close physical proximity to the body,

it was found that

the same total weight and bulk could be worn

much more comfortably. • Wearing the appara.tus underneath ordinary clothing gave rise to a version of the WearComp a.ppa.- ratus which had a normal appearance. Although lnany of the prototypes were undetectable (covert.) to visual inspection, early prototypes of course could be detected upon physical contact with the body/by others. However, it was found that by virtue of social norms, the touching of the body by others (and therefore discovery of the apparatus) was seldom a problem. Thus making certain tha.t the apparatus did not have an unsightly or unusual appearance was found to be suflqcient to integra.ting into society

in a normal way. Unobtrusiveness is essential so that the apparatus does not interfere with normal social situations, for one cannot truly benefit from the long-term adaptation process of HI unless it is worn nearly constantly for a period of ma, ny years. Two examples of underwearables, as they normally appear when worn under clothing, are depicted in Fig 6, where the normal appearance is quite evident. • The close proximity of certain components to the body provided a.dditional benefits, such as the

a.bility to easily integrate measuring apparatus ECG wa,veform, ga.lva.nic skin resistance, etc.,

for quantities such as respira, tion, heart

rate and

full

of the wearer. The th,ct, that the apparatus is worn

undernea.th clothing facilitated direct contact with the body, providing a. much richer measurement space and facilitating new forms of intelligent signal processing.

A.1 Rema, ining issues of underwearable signal processing hardware

This work on "underwearable signal processing hardware" has taken a,n important, first step toward

solving issues of comfort,

detectability

(and

detectability on

physical

contact;

which

remains

a,s

a.

13

(b) Fig. 6. Covert embodiments of WearComp suitable for use in ordinary day-to-day situations. Both incorporate

(b)

Fig. 6. Covert embodiments of WearComp suitable for use in ordinary day-to-day situations. Both incorporate fully functional UNIX-based computers concealed in the small of the back, with the rest of the peripherals, analog to digital converters, etc., also concealed under ordinary clothing. Both incorporate camera-based imaging systems concealed within the eyeglasses, the importance of which will become evident in Section V, in the context of Personal

hnaging. While these prototype units are detectable by physical contact with the body, detection of the unusual apparatus was not found to be a. problem, since norlnal social conventions are such that touching of the body is normally only the domain of those known well to the wearer. As with any prosthetic device, first impressions are important to norlna] integration into society, and discovery by those who already know the wearer well (e.g. to the extent that close physical contact may occur) typically happens after an initial acceptance is already established. Other prototypes have been integrated into the clothing in a. manner that feels natural to the wearer and to others who might come into physical contact with the wearer. (a) lightweight black and white version completed in 1995.

  • (b) full-color version completed in 19q6 included special-purpose digital signal processing hardware based on a,n

array of TMS 32.0 series processors connected to a UNIX-based host processor, concealed in the small of the back.

greater technologica,1 challenge). Issues such a.s shielding of electromagnetic radiation have alrea,dy been largely solved through the use of conductix'e undergarments that protect the body from radiation (especially from the transmitting antennas ttia.t esta, blish the connection to the Internet). In addition to protecting the wearer who would otherwise be in very close proximity to the transmitting antenna,s, such shielding also improves system operation, for example, by keeping RF from the tra.nsmitter a,s well a,s ambient environmental RF out of the biosensors and measurement instrumelltation that is in direct contact with the body. Ma.ny of these practical issues will be dealt with in future design in moving from ea.rly prototypes of the "underwearable signal processing system" into production systems.

B. Multidimensio'nal signal input for Humanistic hztelligence

The close physical proximity of \,VearC, on~p to the body, as described ea.rlier, facilitates

a new

form of

signal processing 3. Because the a,pparatus is in direct contact with the body, it may be equipped with

va.rious sensory devices.

For exa, mple.

a tension

tra, nsducer

(pictured

leftlnost,

running

the

height of

the picture from top to bottom, in Fig 7). is typically threaded through a.nd a,round the underwea.rable, a.t stoma,ch height, so that it measures respiration. Electrodes a.re also insta, lled in such a manner that they are in contact with the wearer's heart. Various other sensors, such a,s a.n arra5~ of transducers

e'The first, wearable computers equipped

wit.h multichannel biosensors were built by the author during

the

1980s inspired by a

collaboration with Dr.

Ghista of McMast,er University.

More recently, in

1995, the author

put. toget.her an itnproved apparatus

based on a Compaq Cont.ura Aero 48G/aa with a ProComp 8 channel analog to digital converter, worn in a Mountainsmit.h waist. bag, and sensors fi'om Thought Technologies Limited. The author subsequently assisted Healey in duplicating this system for use in trying t,o underst, and human emotions [22].

14

gi:::: ......

:i;

:

.... ...

....

Fig. 7. Author's Personal Imaging system equipped with sensors for measuring biological signals. The sunglasses in the upper right are equipped with built in video cameras and display system. These look like ordinary sunglasses when worn (wires are concealed inside the eyeglass holder). At the left side of the picture is an 8 channel analog to digital converter together with a collection of biological sensors, both manufactured by Thought Technologies Limited, of Canada. At the lower right is an input device called the "twiddler", manufactured by HandyKey, and to the left of that is a Sony Lithium Ion camcorder battery with custom-made battery holder. In the lower central area of the image is the computer, equipped with special-purpose video processing/video capture hardware (visible as the top stack on this stack of PC104 boards). This computer, although somewhat bulky, may be concealed in the small of the back, underneath an ordinary sweater. To the left of the computer, is a serial to fiber-optic converter that provides communications to the 8 channel analog to digital converter over a fiber-optic link. Its purpose is primarily one of safety, to isolate high voltages used in the computer and peripherals (e.g. the 500 volts or so present in the sunglasses) from the biological sensors which are in close proximity, typically with very good connection, to the body of the wearer.

in each

shoe

[23] and

a wearable radar

system

(described

in Section VI)

are also included as sensory

inputs to the processor.

The

ProComp

8 channel analog

to digital converter

with

some of the

input

devices that are sold with it is pictured

in

Fig

7 together

with the CPU

from WearComp6.

 

B.1 Safety first!

 

The

importance

of persona,1

safety,

in

the

context

of this

new

paradigm

in

computing,

must be

emphasized. Since electrical connections

are

often

in close proximity

to

the

body,

and

are,

in fact,

often made directly to the body (.such a,s when fitting the undergarment with electrodes tha.t connect to the ba, re flesh of the user, in the chest area, for ECG waveform monitoring), the inputs to which

these connections are made must be fully isolated.

In the present prototypes,

this isola.tion is typically

accomplished by using a length of fiber optic cable between the measurement hardware and the host computer. This is particularly essential in view of the fact that the host computer may be connected to other devices such as head mounted displays containing voltages fl'om 500 volts to as high a.s 9kV (e.g. in the case of head mounted displays containing cathode ray tubes).

15

Moreover, the presence of high voltages in the eyegla.sses themselves,

a,s well

as

in other

parts

of

the system (a,s when it is interfaced to an electronic fla,sh system which typically uses voltages ranging

fi'om 480V to 30kV), requires extra, a,ttention to insulation, since it is harder to free oneself froln the appara.tus when it is worn, than when it is merely carried. For example, a, hand-held electronic fla.sh can be dropped to the ground ea,sier should it become unsta,ble, while a wearable system embodies the

danger of entra, pment

in a fa,iling system.

Furthermore, since batteries may deliver high currents, there would be the risk of fire, and the prospect of being tra, pped in burning clothing, were it not for precautions taken in limiting current flow. Thus in addition to improved high voltage insulation, there is also the need to install current

limiting fuses and the like, throughout the ga,rlnent.

As

an

a,dditiona,1

precaution,

a,ll ga.rments

a,re ma, de froln

flame-proof lna,terial.

In

this

regard,

especia.lly with the development of ea.rly prototypes over the la,st 20 years, it was felt that a. healthy

sense of paranoia, was preferable to carelessness that might give rise to a dangerous situation. B.2 More than just a, health status monitor

It is important to realize that this apparatus is not merely a, biological signal logging device, as is often used in the medica,1 community, but, rather, enables new forms of real-time signal processing for HI. A simple example might include a biofeedback-driven video camera. Pica, rd also suggests its possible use to estima, te huma.n emotion [24].

The emphasis of this chapter

will be on visual image processing

with

the

Vv~arComp apparatus.

The author's dream of the 1970s, that of an intelligent wearable image processing apparatus, is just

beginning to come to fi'uition.

  • V. THE PERSONAL IMAGING APPLICATION OF HUMANISTIC INTELLIGENCE

  • A. Some simple illustrativc examples

A.1 Always Ready: From point and click to 'look and think'

Current da,y commercial personal electronics devices we often carry a,re just useful enough for us to tolerate, but not good enough to significantly simplify our lives. For example, when we are on vacation, our camcorder and photographic camera, require enough a,ttention tha, t we often either miss the pictures we want, or we become so involved in the process of video or photography that we fail to rea, lly experience the inamediate present environment [25]. One ultima.te goal of the proposed appa.ratus and methodology is to "learn" what is visually impor- tant to the wea.rer, and function as a. fully automatic camera tha,t takes pictures without the need for conscious thought or effort from the wearer. In this way, it might summarize a day's a.ctivities, and then automa,tically generate a gallery exhibition by transmitting desired images to the World Wide Web, or to specific friends and rela.tives who might be interested in the highlights of one's travel. The proposed apparatus, a miniature eyeglass-based imaging system, does not. encumber the wearer with equipment to carry, or with the need to remember to use it, yet because it is recording all the time into a. circula.r buffer [19], merely overwriting that. which is unimporta.nt it. is always ready. Although some have noted that the current embodiment of the invention, still in prototype stage, is somewhat cumbersome enough that one might not wear it consta, ntly, it is easy to imagine how. with mass pro-

duction, and

miniaturization, smaller and lighter units could be built,

perha.ps with the com puta, tional

ha.rdwa.re built directly into ordinary glasses. Making the apparatus sma.ll enough to COlnfortably wear

a.t all times will lead to a truly constant user-interface.

In the context

of the always ready framework, when the signal processing hardwa, re detects something

that might be of interest, recording can begin in a retroactive sense (e.g. a, command may be issued to

start recording from thirty seconds ago), and the decision can later be confirmed with human input. Of course this apparatus raises some important privacy questions discussed previously, and also addressed

16

elsewhere in the litera.ture [261127]. The system might use the inputs from the biosensors on the body, a.s a multidimensional feature vector with which to cla.ssify content as importa, nt or unimportant. For exa.mple, it might automatica.lly record a baby's first steps, as the pa,rent's eyeglasses and clothing-based intelligent signal processor make an inference based on the thrill of the experience. It is often moments like these tha, t we fail to capture on film: by the time we find the camera a.nd load it with film, the moment has pa.ssed us by.

A.2 Personal safety device for reducing crime

A simple example of where it would be desirable that the device operate by itself, without conscious thought or effort, is in an extreme situation such a,s might happen if the wea.rer were attacked by a, robber wielding a, shotgun, and demanding cash. In this kind of situation, it is desirable that the apparatus would function autonomously, without conscious effort from the wearer, even though the wearer might be aware of the signal processing activities of the measuring (sensory) apparatus he or she is wearing.

As a simplified example of how the processing might be done, we know tha.t the wea.rer's hea, rt rate,

averaged over a, sufficient time window, would likely

increa.se drama, tically 4 with no corresponding

increase in footstep ra, te (in fa,ct footsteps would proba.bly slow at the request of the gunman). The computer would then make an inference froln the data., and predict a high visua.1 saliency. (If we simply take heart rate divided by footstep rate, we can get a first-order approxilna.tion of the visual sa,liency index.) A high visual saliency would trigger recording from the wea,rer's camera a,t ma, ximM frame rate, a.nd a.lso send these images together with appropriate messages to friends and relatives who would look at the images to determine whether it was a. false alarm or real da.nger s. Such a system is, in effect, using the wearer's bra.in as part of its processing pipeline, because it is the wearer who sees the shotgun, and not the \~arComp appara.tus (e.g. a much ha,rder problem would have been to build an intelligent lna, chine vision system to process the video from the camera and determinethat a crime was being committed). Thus HI (intelligent signa.1 processing arising, in pa.rt, because of the very existence of the huma.n user) has solved a problem that would not be possible using machine-only intelligence. Furthermore, this example introduces the concept of 'collective connected HI', because the signal processing systems also rely on those friends and relatives to look at the imagery tha.t is wirelessly send from the eyeglass-mounted video camera and m~tke a decision as to whether it is a false alarm or real attack. Thus the concept of HI has become blurred across geographical boundaries, and between more than one human and more than one computer.

A.3 The retro-emtofocus example: Human in the signal processing loop

The above two examples dealt with systems which use the human brain, with its unique processing capability, as one of their components, in a manner in which the overall system opera, tes without con- scious thought or effort. The effect is to provide a feedback loop of which subconscious or involuntary processes becomes an integral part.

An important

aspect

of HI

is that

the conscious will of the

user may be inserted

into or removed

from the feedba.ck loop of the~ entire process at any time. A very simple example, taken from everyday

4Perhaps

it

may stop,

or

"skip

a beat"

at

first,

but; over time, on average,

in

the

time following

the event,

experience

tells us

that

our hearts beat

faster when frightened.

 

sit

has been suggested

that

the robber

might become aware that

his

or

her

victim

is wearing a personal

safety device and

try

to eliminate it or perhaps

even target

it

for theft.

In anticipation of these possible

problems,

personal

safety devices operate

by

continuous

transmission

of images,

so that

the assailant

cannot

erase or destroy

the images depicting

the crime.

Moreover, the

device itself, owing to its customized nature, would be unattractive and of little value to others, much as are undergarments,

a mouthguard, or prescription eyeglasses. Furthermore, devices may be protected by a password embedded into a CPLD that. functions as a finite state machine, making them inoperable by anyone but the owner. To protect against passwords being

extracted through torture,

a personal

distress

password

may

be provided

to the assailant

by

the

wearer.

The

personal

distress

password unlocks the system but puts it into a special tracking and distress notification mode.

17

experience, ra, ther tha,n a,nother new invention, is now presented.

One of the simplest examples of HI is tha, t which happens

with some of the early autofocus Single

Lens Reflex (SLR) cameras in which autofocus wa.s a, retrofit feature. The a,utofocus motor

would

typically turn the lens barrel, but the operator could also grab onto the lens barrel while the autofocus mechanism was making it turn. Typica, lly the opera.tor could "fight" with the motor, and easily overpower it, since the motor was of sufficiently low torque. This kind of intera.ction is particularly useful, for example, when shooting through a glass window at a distant object, where there are two or three local minima, of the a.utofocus error function (e.g. focus on particles of dust on the gla,ss itself, focus on a reflection in the glass, and focus on the distant object). Thus when the operator wishes to focus on the distant object a,nd the camera system is caught in one of the other local minima

(for example, focused on the glass), the user merely grasps the lens barrel, swings it around to the a,pproxilnate desired location (as though focusing crudely by hand, on the desired object of interest),

and lets go, so that the camera will then

take over and bring the desired object into sharp focus.

This very simple example illustrates a sort of humanistic intelligent signal processing in which the

intelligent autofocus electronics of the camera work in close synergy with the intellectua,1 capabilities of the camera operator.

It

is

this

aspect

of HI,

that

a.llows the human

to

step

into and

out

of the

loop

a.t any

time, that

makes it a. powerful para.digm for intelligent signal processing.

  • B. Mathematical fram, ework for person, al imagiT~g

The theoretical fl'amework for HI is based on processing a series of inputs from various wearable sensory apparatus, in a, manner that regards each one of these as belonging to a measurement space; each of the inputs (except for the computer's keyboard-like input device comprised of binary pushbut- ton switches) is regarded as a measurement instrument to be linearized in some mea, ningful continuous underlying physical quantity.

Since the emphasis of this chapter is on personal imaging, the treatment here will focus on the wearable camera (discussed here in Section V) and the wearable radar (discussed later in Section VI). The other measurement instruments are important, but their role is primarily to facilitate harnessing and amplification of the human intellect for purposes of processing data, fi'om the imaging apparatus. The theoretical framework for processing video is based on regarding the camera as an array of light mea.suring instruments capable of measuring how the scene or objects in view of the camera respond to

light 6. This framework has two

important specia.1 cases, the first of which is ba.sed on a quantifiable self-

calibration procedure and the second of which is ba,sed on algebraic projective geometry a.s a means of combining information from images related to one-another by a projective coordinate transformation.

These two special cases of the theory are now presented in Sections V.B.1 and V.B.2 respectively,

followed by bringing both together in Section \/.B.3. The theory is a.pplicable to standa.rd photogra, phic

or video cameras, a,s well as to the wearable camera and personal imaging system.

B.t Quantigra.phic hnaging and the Wyckoff principle Qua,ntigraphic imaging is now described.

It should be noted

that

the goal of quantigrapliic imaging, to regard the camera as an array of light

measuring instruments, is quite different from the goals of other related resea.rch[29] in which there is a.n attempt to sepa, rate the reflectance of objects from their illumination. Indeed, while Stockham's effort was focused on separately processing the effects due to reflectance and scene illumination[29], qua.ntigraphic imaging takes a camera.-centric viewpoint, and does not attempt to model the cause of the light entering the camera, but merely determines, to within a, single unknown scalar constant, the

qua.ntity of light entering the camera, fi'om each direction in space. Quantigraphic imaging measures neither radiometric irradiance nor r~hotometric ilhtminance (since the camera will not necessarily have

6This 'light, space'

theory

was first, written

up

in detail

in

1992 [28].

18

the same spect.ra.1 response a.s the human eye, or, in pa.rticula, r, that of the photopic spect.ra.1 luminous efficiency function as determined by the CIE a.nd standardized in 1924). Instead; quantigra.phic imaging

measures the quantity of light integrated over the pa.rticula.r spectra,1 response of the camera system, in units that are qua, ntifia, ble (e.g. linearized), in much the same wa,y t;ha.t a photographic light meter measures in qua, nt, ifiable (linea.r or loga.rit, hmic)units. However, just. a.s the i)hot,ogra.1)hic light, meter imparts to the mea, surement its own spectral response (e.g. a, light meter using a. selenium cell will impart the spectral response of selenium cells to the measurement) quantigraphic imaging a.ccept.s tha.t, there will be a particula, r spectral response of the camera, which will define the quant, igra, phic unit. of measure. A field of research closely related to Stockham's approach is that of colorimetry, in the context of

the so-called color coT~,stancy proble'nz[30] [31] in which there is an

attempt ma, de to determine the

true color of a,n object irrespective of the color of the illuminant. Thus solving the color con,stanc 9

problem, for example, might amount, to being able to recognize the color of a.n object a,nd ignore

the

ora.nge color cast of indoor tungsten lamps or ignore the relatively bluish cast, arising from viewing the same object under the illumination of the sky outdoors. Qua, ntigra, phic imaging; on the other ha.nd, makes no such attempt, a,nd provides a. true measure of the light, a.rriving a.t. the camera, without a.ny attempt to determine whether color effects a.re owing to the natural color of a.n object, or the color of the illumination.

Qua, ntigra.phic imaging ma,y, however, be a,n important first step to solving the colo'r co'n.sta,7~,cy

problem

once

we know

how much

light

is arriving from_ each

direction

in spa, ce, in ea.ch of the

spectral ba, nds of the sensor, and we have a, mea, sure of these quantities that is linearized, we can then

apply to these quantigra, phic images,

any of the traditional mathematical image processing frameworks

such as those of Stockham[29], those of Venetsanopoulos[32], or those froln color theory. Qua.ntigraphic imaging may also be a first step to other uses of the image da, ta, whether they be for machine vision

or simply for the production of a visually plea.sing picture. The special case of quantigraphic imaging presented here in Section V.B.1 perta, ins to a fixed camera. (e.g. as one would encounter in mounting the camera on tripod). Clearly this is not directly applicable

to the wea.rable camera system, except perhaps in the case of images acquired ill very rapid succession. However, this theory, when combined with the Video Orbits theory of S~t e ~ctlon ° V o B.2, is found to be

useful in the context of the personal imaging system,

as will be described

in S~ e ration

°

V.B.3.

,Fully a,utomatic methods of seamlessly combining differently exposed pictures to extend dynamic range ha.ve been proposed[33][34], and are summarized here.

Most everyday scenes have a far greater dynamic range than

can be recorded on a, photographic.

film or electronic imaging apparatus (whether it be a digital still camera,, consumer video camera, or eyeglass-based personal imaging appara, tus as described in this paper). However, a set of pictures, that are identical except for their exposure, collectively show us much more dynamic ra, nge tha,n a.ny

single picture from that set, a,nd also allow the camera's response function to be estilna, ted, to within

a single

constant sca,la, r unknown.

A set of functions,

E~ (x)

-

.f(k,~q(x)),

(1)

where k~ a.re scalar constants, is known as a. Wyckoff set [35][19], and describes a set of images, E.,~., when x = (x, y) is tile spatial coordinate of a. piece of film or the continuous spatial coordinates of the focal plane of an electronic imaging array, q is tile quantity of light ~falling on the sensor a,rray, and f is the unknown nonlinearity of the camera's response function (a,ssumed to be invariant to x C ]P,?. Because of the effects of noise (quantization noise, sensor noise, etc.), in pra.ctica.1 imaging situations, the dark (often underexposed) pictures show us highlight details of the scene that. might have been overcome by noise (e.g. wa,shed out) had the picture been properly exposed. Similarly, the light pictures show us some shadow detail that might not have appeared above the noise threshold ha.d the picture been properly exposed.

19

A means of simulta.neously estimating .f a.nd k,, given a. \¥yckoff set E,,, has been proposed [:3:3][:3.5][19]. A brief outline of this method follows. For simplicity of illustration (without loss of generality), suppose

that the \¥yckoff set conta.ins two pictures, E1 -f(q) a,nd E2 -f(kq), differing only in exposure (e.g. where the second ilnage received k times a,s much light a,s the first). Photographic fihn is traditionally characterized by the so-called "D logE" (Density versus log Exposure) cha,ra.cteristic curve [36][:]7].

Silnila, rly, in the case of electronic

imaging, we ma.y a.lso use logarithmic exposure units, Q = log(q),

so tha, t one image will be A -- log(k) units darker tha, n the other:

log(f-l(E1))-~

-log(.f-l(~2))-

I~"

(2)

The existence of a.n inverse for f

follows from the semimonotonicity a.ssumption [:35][19]. (We expect

a.ny rea.sonable camera, to provide a. semilnonotonic relation between quantity of light received, q, and the pixel value reported.) Since the logarithm function is also monotonic, the proble m is reduced to that of estima.ting the semimonotonic function F() - log(f-l()) and the scala, r constant It', given two

I)ictures E1 a.nd E2"

 

F(E2)-

F(E1)+

Ix

(3)

Thus:

 

E2 -

F-I(F(EI) + I£)

 

(4)

provides a. recipe for "registering" (a.ppropria, tely lightening or darkening) the second image to match the first. This registration procedure differs froln the image registration procedure commonly used in image resolution enha.ncelnent (to be described in S ~ctlon V.B.2) because it opera,tes on the range (tonal range) of the image E(x) a,s opposed to its domain (spatial coordinates) x - (z, 9)- (In Section V.B.3, registra.tion in both dom ai'n a.nd l'ange wil be addressed.) Once f is deterlnined, each picture becomes a different estimate of the same true quantity of light fa.llil-lg on each pixel of the image sensor:

~

"

1

(in -- 7,f-l(En)

(5)

Thus one ma.y regard each of these measurements (pixels) a,s a, light meter (sensor element)that ha,s some nonlinea, rity followed by a, quantiza, tion to a measurement having typically 8-bit precision. It should be empha,sized that most image processing algorithms incorrectly a.ssume that the camera response function is linea, r (e.g. almost all current image processing, such as blurring, sharpening, unsha, rp masking, etc., operates linearly on the image) while in fa,ct it is seldom linear. Even Stockham's holnomorphic filtering [29], which advocates taking the log, applying linear filtering, and then taking the antilog, fa,ils to capture the corx'ect nonlinearity [19][35], a.s it ignores the true nonlinearity of the sensor a.rras,;. It has recently been shown [35][19] that, in the absence of any knowledge of the camera's nonlinearity, an a,pproxima.te one-parameter parametric model of the camera's nonlinear response is far better than assuming it is linear or logarithmic. Of course, finding the true response function of the ca.mera allows one to do even 1)ett.er, a.s one may then apply linea.r signal processing methodology to the origina.l light fa.lling on tlle image sensor.

B.2 \,:ideo Orbits

A useful assumption in the domain of 'personal imaging', is tha.t of zero i)a.ra.lla,x, whether this be for ol)ta.ining a. first-order estima, te of the yaw, pitch, and roll of the wea.rer's hea,d [20], or making an importa, nt first step in the more difficult problem of estimating depth a,nd structure from a scene 7. Thus, in this section, the a,ssumption is that most of the image motion a.rises from that of generating an environment map, zero-parallax is assumed.

7first, modeling

the

motion

as a projective

coordinate

transformation, and then estimating the residual epipolar structure oi" the

residual epipolar structure or the like[38][39][40][41][42][43]

20

The problem of assembling multiple pictures of the same scene into a single image commonly arises in mapmaking (with the use of aerial photography) a,nd photogrammetry[44], where zero-parallax is also generally assumed. Many of these methods require human interaction (e.g. selection of fea.tures), a.nd it is desired to have a fully automated system tha.t can assemble images from the eyeglass-based camera. Fully automatic fea,tureless methods of combining multiple pictures have been previously proposed[45][46], but with a,n empha.sis on subpixel image shifts; the underlying assumptions and models (a.ffine, and pure tra.nsla.tion, respectively) were not ca,pable of a.ccurately describing more

ma, croscopic inaage motion. A characteristic of video captured from a hea, d-mounted camera, is that it tends to have a great deal more macroscopic image motion, and a great deal more perspective 'cross- chirping' between adjacent frames of video, while the assumptions of static scene content and minimal parallax are still somewhat valid. This assumption arises for the following reasons:

• Unlike the heavy hand-held cameras of the past, the personal imaging apparatus is very lightweight. • Unlike the hand-held camera which extends outward fi'om the body, the persona.1 imaging appa.ra, tus is mounted close to the face. This results in a much lower moment of inertia., so tha.t the head can be rotated quickly. Although the center of projection of the wearable camera, is not loca.ted a,t the center of rotation of the neck, it is much closer than with a ha,nd-held camera. It was found that the typical video generated from the personal imaging apparatus was characterized by rapid sweeps or pans (rapid turning of the head), which tended to happen over much shorter time intervals and therefore dominate over second-order effects such as para.lla.x and scene motion [19]. The proposed method also provides a.n indication of its own failure, and this can be used as a feature, rather than a "bug" (e.g. so that the WearComp system is. a.ware of scene motion, scene changes, etc., by virtue of its ability to note when the algorithm fails). Thus the projective group of coordinate

transformations captures the essence of video from the WearComp

a.ppara.tus s.

Accordingly, two featureless methods of estimating the parameters of a projective group of

coordina, te

transformations were first proposed in [33], and in more detail in [35], one direct a.nd one based on optimization (minimization of an objective function). Although both of these methods are multiscale (e.g. use a coarse to fine pyramid scheme), and both repeat the parameter estimation a.t each level (to compute the residual errors), and thus one might be tempted to call both iterative, it is preferable to refer to the direct method as repetitive to emphasize that does not require a. nonlinea, r optimiza.tion procedure such as Levenberg-Marqua, rdt[48] [49], or the like. Instead, it uses repetition with the correct law of composition on the projective group, going from one pyramid level to the next by applica.tion of the group's la.w of composition. A method similar to the optimiza, tion-method was la,ter proposed

in [3,5][,50]. The

direct method has also been subsequently described in more detail [51].

The direct featureless method for estimating the 8 scalar parameters 9, of an exact projective (homo- graphic) coordinate transformation is now described. In the context of personal imaging, this result is used to multiple images to seamlessly combine images of the same scene or object, resulting in a single image (or new image sequence) of greater resolution or spatial extent. Many papers have been published on the problems of motion estima, tion and frame alignment. (For review and comparison, see [52].) In this Section the emphasis is on the importance of using the "exa.ct" 8-pa.rameter projective coordinate transformation [51], particularly in the context of the head-worn minia.ture camera. The most common assumption (especially in motion estimation for coding, a,nd optical flow for com-

puter vision) is that the coordinate transformation between frames is translation. Teka, lp, Ozkan, and Sezan [46] have applied this assumption to high-resolution image reconstruction. Although translation is less simpler to implement than other coordinate transformations, it is poor at handling large changes

SThe additional one-time download of a lens distortion map, into Wearcomp's coordinate transformation hardware, eliminates its lens distortion which would otherwise be very large, owing to the covert, and therefore small, size of the lens, and in engineering compromises necessary to its design. The Campbell method [47] is used to estimate the lens distortion for this one-time coordinate transformation map. 9published in detail in [51]

21

due to ca.l-hera, ZOOlll, rotation, pan a,nd tilt. Zheng and Chella,ppa [53] considered the image registra, tion problem using a. subset of the affine model translation, rota.tion a.nd scale. Other resea, rchers [45][54] have a,ssumed a.ffine motion (six parameters) between frames. The only model that properly captures the "keystoning" a.nd "chirping" effects of projective geom-

etry is the

projective coordina, te tra, nsforma.tion. However, because the parameters of the projective

coordinate transibrma.tion ha.d traditiona.lly been thought to be mathematically and computa.tiona,lly too dit~cult to solve, most resea, rchers ha.ve used the simpler a,mne model or other a.pproxima.tions to the projective model. The 8-pa, rameter pseudo-perspective model [39] does, in fa,ct, capture both the converging lines and the chirping of a projective coordinate transformation, but not the true essence of projective geometry.

Of course, the desired "exact" eight parameters come froln the projective group of coordinate tra.ns- formations, but they have been perceived as being notoriously difficult to estimate. The parameters for this model ha,ve been solved by Tsai and Huang [55], but their solution assumed that features had been identified in the two frames, a.long with their correspondences. The main contribution of the

result sulnma,rized ill this S.ction ~e

is a simple featureless mea,ns of a,utoma, ticallv,J solving for these 8

parameters. A group is a, set upon which there is defined an a,ssocia.tive law of composition (closure, associativ- it9), which conta, ins at least one element (identity) who's composition with another element leaves it unchanged, and for which every element of the set has a,n inverse.

A group of operators together with a

..

set

of opera.nds form a. so-called 9r'o'u,p operation 1°. In the

context of a, Lie group of spatial coordinate tra, nsformation operators acting on a set of visual images

as operands, such a. group is a.lso known a.s a, Lie group of tran

o:form,ations[57].

._

Note tha, t Hoffma,n's use of the term "transforma.tion" is not synonymous with "homomorphism" [56]

a.s is often the case in group theory, between elements 9, h E G, such that

such a.s when a, transformation

T

acts on

a, law of composition

T(gh) - T(9)T(h,). Instead, what is mea, nt 1)5: "transforma, tion",

in the context of this paper, is a, change in coordinates of a, picture (image). Thus, in the context of this paper, transformations act on images not on elements of the group (which happens to be a group

of tra.nsforma, tion

operators).

As in Hoffman' s work[57], coordinate tra, nsformations axe operators selected from a. group, and the set of images are the operands. This group of operators and set of images thus form the group a.ction in the sense defined in[56]. When the coordinate transformations form a group, then two such coordinate transformations, pl and p2, acting in succession, on an image (e.g. pl acting on the image by doing a coordinate transforma, tion, followed by a, further coordinate transformation corresponding to p2, a,cting on that result) can be replaced 1)3: a single coordinate transformation. That single coordinate transformation is given by the law of composition in the group. The orbit of a. particula,r element of the set, under the group operation [56] is the new set formed by applying to it, all possible operators from the group. Thus the orbit is a collection of pictures formed from one picture through a.pl)lying all possible projective coordina, te tra, nsformations to that picture. This set. is referred to as the 'video orbit' of the picture in question [51]. Equivalelltl3. we nla.y imagine a, sta, tic scene, in which the wearer of the persona,1 imaging syst.eln is standing at a single fixed loca,tion. He or she generates a family of images in the salne orbit of the projective group of tra.nsforma.l:ions 1)3: looking a.roulld (rota.tion of the head) 11.

The coordina,te tra, nsforlnations of interest, in the context of this pa,per,

 

,

[x']

A[x,y]T+b

Ax+b

(6)

 

x

-

y,

-

cr[x,y]r+d=

eTx+d

10also known

as

a

grot

, action

or

G-set

[56].

liThe

resulting

collect.ion

of

images

 

may

be

characterized

!)3" fewer

parameters,

t, hrough

applicat, ion

of

the

Hartley

con-

st.taint

[58][35] to an estimate

of a projective

coordinate

transformations.

 

22

define

an operator,

P1

where

the operator

P1

that

acts on the images a.s follows:

PA~,b~,ct,a~oE(x)-E(x')-E(Alx+bx)

clrx

-t- dl

is parameterized by A1 C IR 2x2, bl

C IR 2xl,

c 1 ~ ]R 2xl,

and

dl

E IR.

(7)

These operators may be applied to an image in succession, this succession being defined, for example,

with such operators,

P x and

P2,

a.s"

P2

o P1

o E(x)

-

PA~,b~,¢~,d~ o PA~,b,,¢~,d~ o E(x)

(s)

P2

o P1

o E(x)

-

E

A2( Alx+bl ) _~_ 52)

T

¢lrx+dl

~bl

c2(

cx x+dl

r

)+d2

(9)

c2

,d2

Cl

,dl

(10)

fi'om which

we can

see that

the operators

can be represented

as

2

x

2 matrixes,

and

that

the law

of composition defined on these operators can be represented by matrix multiplication. Associativity

follows from matrix multiplication, and the matrix

 

I2,

0]

0,

1

(11)

where/2 is the identity matrix [1,0; 0, 1].

 

The 'video orbit'

of

a

given

2-D

frame

is defined

to

be

the

set

of all images

that

can be produced

by applying operators from the 2-D projective group of coordinate transformations

(6)

to

the

given

image.

Hence, the problem may be restated:

Given

a

set

of images

that

lie

in

the

same orbit

of" the

group, find for each image pair, that

operator

in the group

which takes one image to

the other

image.

If two frames of the video image sequence, say, fl

 

and

f2,

are

in the

same orbit,

then

there

is

an

group operation p such that the mean-squared error (MSE) between fl and f; - p o .h is zero. In

practice, however, the element of the group that

takes one image "nearest"

the other

is found

(e.g.

there

will be

a

certain

amount

of error

due

to violations in

the assumptions,

due

to

noise such

as

parallax, interpolation error, edge effects, changes in lighting, depth of focus, etc). The brightness constancy constraint equation [59] which gives the flow velocity components, is:

 

ufTEx

+

Et

~

0

(12)

As is well-known [59] the optical flow field in 2-D is under constrained The model of pur'e translation at every point has two parameters, but there is only one equation (12) to solve, thus it is common practice to compute the optical flow over some neighborhood, which must be at least two pixels, but is generally taken over a small block, 3 x 3, 5 x 5, or sometimes larger (e.g. the entire image, as in the Video Orbits algorithm described here). However, rather than estimating the '2 parameter translational flow, the task here is to estimate the eight parameter projective flow (6) by minimizing:

Cfto~

-

~-~ (umTEx

+

Et) 2

2

23

(13)

Although a sophisticated

nonlinea.r optimization procedure,

such a,s Levenberg-Marquardt,

may be

applied to solve (1:3), it ha,s been found that solving a. slightly different but much easier problem, allows

us to estima.te the parameters more directly and accurately for a. given amount of computation

[51]"

 

~=,,~, :

~

((Ax

+

b -(cTx

+ d)x)TE×

 

+

(cTx +

d)Et) 2

 

(This amounts to weighting the sum differently.)

 

Differentia.ting (1:3) with respect gives a, linear solution:

to the free parameters

A, b, and

 

c,

and

setting

the result

 

(E

¢¢T)

[all,

a l2, bl, a21,

a22, b2, el,

c2] r

 

-

E:

(x

Ex

-

Z,)¢

where

cT

_

[E,:(x, y, 1), Ey(x,

y,

1), xEt

-

x2Ex