Vous êtes sur la page 1sur 10

LEARNING GESTALT OF SURFACES IN

NATURAL SCENES

Amir Assadi,* Stephen Palmer,t Hamid Eghbalniat

INTRODUCTION.

Most surfaces in the world around us are generally not ideal smooth mathe-
matical surfaces. Many rough and non-smooth surfaces, such as rolling hills
covered by grass, star fish, sea urchins, hedgehogs, and many other living
beings or natural scenes are perceived to have a piecewise smooth “global”
geometric form that is quite distinct from the small-scale “local” geomet-
ric properties. How do we associate a large-scale piecewise smooth surface
geometry to such objects or scenes (what we may call an imaginary model
surface), and how do we use the small-scale information to add “texture”
to the imaginary surfaces in order to complete our visual understanding?
Questions of similar nature have attracted vision scientists for some time.
There are a number of different formulations and even proposed solutions to
such local-global problems. The most relevant theories come under names
Multi-scale Representations, Scale Space Theory and their variations (based
on the idea that different characteristics of an image reveal themselves at
different levels of resolution.) Our theory complements such existing theories
in several ways.
We develop a computational model for scenes with surfaces that have
rough and non-smooth small-scale structure but with a perceived global
(larger-scale) geometric form. Examples include grass and meadow, surfaces
textured with sand-paper, natural scenes having rough texture such as the
skin of crocodile, pine cones, a field of sea urchins, forests, ripples and waves
on water surfaces, etc. Another domain of examples arise in scientific explo-
ration of microscopic images, such as the atomic force microscopy (AFM) im-
ages from alloys in materials science, molecular beam epitaxy (MBE), rough
surfaces due to ballistic deposition (BD surfaces) and random deposition sur-
faces (RD). As a last example, one may translate some outstanding image
processing problems of infra-red astronomy to understanding the random tex-
*Center for the Mathematical Sciences and Department of Medical Physics, UW-
Madison, WI 53706
+Center of Cognitive Studies and Department of Psychology, UC-Berkeley, CA 94720
$Center for the Mathematical Sciences and Department of Mathematics, UW-Madison,
WI 53706

0-7803-5673-X/99/$10.00 0 1999IEEE 380


ture of clouds combined with noise, e.g. t o describe algorithms that detect
stars within noisy data provided by infra-red imaging devices .

APPROACH.

We propose several complimentary paradigms (each leading to a computa-


tional model for a 3D-textured surface)) and their experimental implementa-
tion in order to compare their outcomes form several points of view: model
precision and effectiveness in predicting human visual perception under var-
ious circumstances, computational performance, robustness and algorithmic
efficiency. One reason for multiple paradigms is the diversity of circumstances
under which visual perception occurs. Another reason is the applications
to different subjects, as briefly mentioned above. It is often the case that
each scenario calls for a different paradigm. Finally, comparison of different
paradigms leads t o better evaluation of the algorithms.
Our main contributions are: (a) the development of the general geometric
theories that associate to a given 3D-textured surface S, a family of piecewise
smooth surfaces S(t) that are “perceived-forms” of S from the large-scale view
(in the world.) (b) Development of the learning algorithms to recover shape
from texture for large families of such surfaces.
We will endow members of S(t) with 2D-texture via an appropriately
defined “transfer“ (pull-back of the texture mapping, in the jargon of dif-
ferential topology) of the actual texture of S from the scene. Next, we will
adapt an appropriate scheme to define objective functions on these families,
whose minima distinguish the 2D-surfaces that best describe the large-scale
form of the original surface S in early vision, under the given circumstances
(e.g. a particular observer in the scene.) This step is accomplished by means
of a neural network with back propagation. For simplicity at this stage,
we assume that the figure-ground separation is achieved, e.g. via a suitable
segmentation algorithm. We show later how to modify such methods to ob-
tain segmentation algorithms based on Gestalt. The “shape” of S(t) and its
other mathematical invariants are studied via standard tools from differen-
tial geometry and topology. Our work in progress includes validation of the
perceived form of natural surfaces by comparison of the outcome of our com-
putational theory with that of S through psychophysical measurements. An
important step (in the algorithms and their experimental implementations) is
the mathematical problem of constructing the family S(t))briefly mentioned
below.

The Rolling Spheres Paradigm.

Consider a sphere of radius R=t rolling on the given 3D-textured surface S,


and let the locus of centers of these spheres be called S(t). As the radius
R approaches zero, S(t) approaches S. Through suitable reflection of the
spheres, a family S(t) is defined. Here, t is the absolute value of a parameter

38 1
that lies in an appropriate interval (-a, a), and S(O)= S. Thus, the family
approximates S, and for larger values of the parameter, interprets the surface
with coarser resolution. The mathematical consideration of this model leads
to very interesting and natural problems in studying geometric invariants of
surfaces with Lipschitz structure on the one-hand, and controlled topology
on the other. From the view-point of pure mathematics, these problems are
basic and quite interesting in their own right.

The Foliation Paradigm.

The family S(t) can be also considered as leaves of a foliation (in the sense
of differential topology.) To obtain such a foliation, we must describe an
integrable distribution (as in Frobenius Theorem). Since we are in 3D, every
distribution is determined through its associated unit normal field. An av-
eraging procedure based on a family of meshes that approximate the surface
(e.g. as in a one-dimensional analogue of the rolling ball paradigm applied to
“noisy coordinate curves” of S cut by the planes passing through the mesh
and parallel to the z-axis, if we locally describe the surface as a graph of a
real-valued 2D function.)

Statistical Averaging.

This approach lends itself to discrete models, where averages of normals to the
discretized surface are performed in multi-step neighborhood. These define
the statistical normal vector-fields, hence the 2D-distribution that will be
verified to be integrable, and integrate to form the family of surfaces S(t).

The Jacobi-Hamilton Equations and Level Surface Theory.

When S is the graph of a function with bounded variation (here, our func-
tions are assumed continuous with realistic variations), we can apply the
Jacobi-Hamilton equations, and the family provided by Hopf’s solution. The
level-surfaces of this family yield S(t). The crux of the matter here is to
reduce the problem for a reasonably large collection of surfaces to the case
of a graph. The graph case can be also generalized, but here we leave out
the technical discussion. This approach may be compared with the Heat
Equation approach in which heat diffuses according to a Dirichlet boundary
problem (e.g. ,Humme1 1987[1]: The scale space formulation of pyramid data
structures.) Discrete versions corresponding to the heat equations are simu-
lated and solved, e.g. as in Hummel 1987[1]. Similarly, the discrete versions
of our models can be simulated and solved. We may remark that high per-
formance computing techniques are available for faster and more extensive
numerical experiments (e.g. using the WARP parallel systolic array).

382
Convolution and Filtering.

Finally, we consider a family of kernels K(t), and perform the convolutions


that average the local variation of the surface, and gives rise to the family
S(t). This approach is the closest in spirit to multi-resolution or multi-scale
theories of vision. On the other hand, unlike the existing theories that lead
to blurring the image in order to extract coarser information, we propose
to create surfaces embedded in the 3D-space with coarser details but not
necessarily blurred images. Thus, in our approach the image processing step
will be done at a later stage.
We have also made progress on applying the above-mentioned theory of
vision to initiate the mathematical theory of textured surfaces (including
natural scenes) that has led us to non-trivial mathematical constructs and
more subtle and advanced geometric theories.

DISCUSSION.

It is worthwhile to mention the novel features of our theory, and compare it


to the research of others.
(a) We propose to use the theory of foliation from differential topology to
create a geometric model for the world with sufficient flexibility, so that at
least one locally smooth mathematical surface (i.e. one leaf of the foliation)
can be singled out to characterize the perceived geometry of the scene, so that
most of these computations happen” in the world.” Only at a later stage,
the image of the artificial 2D-textured mathematical surface in the 3D-space
will be compared with the image of the real scene. In the work of others,
all the manipulations (e.g. blurring and other approximations) are applied
directly to the 2D-image that represents the real world. Our method has
several advantages: for instance, as progress is made on finding more efficient
algorithms to describe shape from texture, shape from shading, shape from
motion etc., we can apply them to the 2D-images obtained from the leaves
of the foliation, and update the output of the theory. Thus, this theory
complements the present and future image-based research on perception of
shape.
(b) The mathematical theory in the research has potential for experimen-
tal verification via virtual reality environments, although at this point we
plan to use standard computer graphics to generate the textured surfaces in
the family that will be used in the psychophysical measurements. The VR
will open a new research direction: merging of the tactile and visual senses
to describe the perceived shape (and eventually, merging of auditory, tactile
and visual senses.) The image-based algorithms do not seem to have this
flexibility.
(c) The theory of foliation will provide a powerful interface between mod-
ern mathematical research and biology and a new set of tools and geometric
concepts (to our knowledge, foliation theory has not been done in neuro-

383
science in the past.) On the mathematical side, one expects that a powerful
and rigorous variational method eventually emerges, that will lead to a PDE
whose solution describe the perceived shape. As usual, such PDEs have the
advantage of bringing well-understood and standard tools (e.g. numerical
solutions) to solve the related problems.

COMPUTATION.

In this paper, we describe the results of computations with the Foliation


Method, and comment on the Rolling Ball Method that has a similar fla-
vor. To avoid excessive technicalities, we consider an approximation to both
methods that works well in most cases that the variations in the curvature of
the Gestalt surface is not too high, corresponding to pieces of surfaces with
Gestalt curvatures bounded between two numbers that are bounded in abso-
lute value by a constant multiple of the measure of solid viewing angle. The
cases where the principal curvatures of the Gestalt surface varies frequently
and rapidly (within the viewing angle) pose technical problems related to
emerging cusp and fold singularities for the leaves of foliation formed by the
loci of centers of rolling balls. Such singularities need to be smoothed out,
in order to restore the smoothness percept of the Gestalt of the surface, and
are generally artifacts of the computational method, rather than a natural
part of the foliated structure that our theory predicts abstractly. We hope
t o address in the near future a solution for such technical difficulties of our
methods.
An intuitively appealing and computationally accessible method for con-
struction of the leaves of the foliation are as follows.
First, we solve the corresponding problem in the Euclidean plane. Con-
sider a curve C that is the graph of a piecewise smooth function f on an
interval [a,b]. Assume that the variation of differences of values of consecu-
tive maxima (correspondingly consecutive minima) are bounded between a
small integer constant multiple of b-a. This condition is similar to the con-
dition regarding the roughness of the surface relative to the viewing angle.
We can select an average smoothing of C in a variety of ways. For exam-
ple, we select a uniformly distributed sample of points within [a,b], take the
DFT off using the sample, and filter the high ferequencies. Analogously, we
could take a convolution off with a Gaussian smoothing kernel. In the latter
approach, different variances give rise to different smoothing of C, analo-
gous to the Convolution and Filtering Method that we described above. The
smaller the variance, the closer the smoothing to C. As the variance grows,
the smoothing loses more local features of C and approaches to a flat curve.
The Rolling Ball Method becomes the rolling circle method, and intuitively
corresponds to the smoothing kernel above. To use these observation for the
case of vision, we assume that the observer’s eye is located in a fixed location
above the plane, say on the z-axis looking down at the xy-plane containing
C. In any of the above cases, the family of curves in the plane correspond to

3 84
the family of smoothing curves after scale and shifts that brings the viewing
angle for the graph to be the same as the viewing angle for C.
Next, we the case of the curve above to simplify the problem for the
construction of family of smoothing for a surface. Consider the point P on
a (natural) surface S. First, construct a piecewise linear neighborhood of
the point P that are local maxima, approximating the given surface patch.
Next, we select a randomly and uniformly distributed set of points on S, and
construct a piecewise linear surface L consisting of simplixes (flat triangular
surfaces) that contain P in their center, and so that L is topologically a
disk. Next, we take an average of the unit normal vectors for the piecewise
linear surface L, and normalize it for the sake of convenience. Call this
vector N. For a generic choice of random points and a generic choice of L,
the vector N has the following property: Among the perpendicular pairs
of planes passing through N, intersecting the surface S in a pair of curves
C and C', almost all pairs C and C' have (infinitely many) smoothings at
different scales (as described above) that are graphs of functions that are,
in particular, at least twice differentiable at P, and in general, piecewise
differentiable. (The proofs of these assertions follow from general position
and transversality arguments in differential topology.) Any such pair C and
C' is then smoothed with a choice of scale, and give rise to candidates for
the coordinate curves for a smoothing of the surface at P. That is, these
curves interpolate to define a surface patch for a leaf of the foliation at that
scale around the point P. In practice, one selects a set of points along C
and C' and constructs the system of curves that we may call a system of
"rough coordinate curves" near P. Then one proceeds with the choice of
"rough normals", and cuts a system of rough coordinate curves on the natural
surface. Next, we continue to smooth the system of curves at each scale, and
interpolate them to form the surface elements. The surface elements come
together to form the leaves of the foliation, as proposed above. For the sake
of brevity, we leave out the discussion of smoothing textures. Suffice it to say
that texture is constructed as a section of a vector bundle on S, or locally, a
set of vector-valued functions. Kernel smoothings of a section representing
an original texture are constructed in the same spirit.
The figures (1,2,3 and 4) below illustrate the procedure. More details and
discussion of the code is available on the web site
http:://www.cms.edu/-cvg.

RELATED RESEARCH.

Among other related literature, we mention a sample from Scale Space The-
ory and related approaches. Rosenfeld and Thurston [2] take advantage of si-
multaneous use of different size operator masks (corresponding to the spatial-
frequency channels, as it was shown later) to increase sensitivity to edges of
variable resolution, and also to withstand the noise. Burt and Adelson [3]
considered the pyramidal structure as the Laplacian of successively blurred

385
Figure 1: "Rough" surfaces before foliation
images (also known as the Gaussian pyramid.) Other related literature in-
clude: Crowley and Parker [4] and Meer, Baugher and Rosenfeld [5] and
Mallats[G] There is also a body of literature based on applications of fractals,
e.g. Pentland [7] (Fractal-based descriptions) proposes fractal -like models
for all texture, and Peleg et. al. [8] refute Pentland's hypothesis, and sug-
gests how to use changes in measurements in different scales to characterize
texture. A later suggestion along these lines is due to Keller et. a1 [9] who
introduce the concept of lacunarity (that captures the second order statistics
of fractal surfaces.)
On the perception side, Gagalowicz [lo] , Julesz and Bergen [ll] and
Caelli [12] among others, have offered evidence for visual perception as a the
local process. Wilson and Bergen [13] and DeValois et. a1 [14] establish the
relevance of multi-channel frequency analysis systems in human vision, and
explain/measure quantitatively the relationship to Hubel and Wiesel's clas-
sical discovery of receptive fields (RF) of simple cells. Hubel and Wiesel's
theory of receptive fields is, on the other hand, interpreted by Koenderink
and van Doorn [15] in terms of filtering theory of images (i.e. convolutions
with the Laplacian of Gaussians); thus we may regard RFs as multi-channel
filters in this context. This is potentially a starting point to search for the
physiological basis for our theory, and further relationships to other related

Figure 2: Siirfaces after foliation

386
Figure 3: Some coordinate Curves of ”rough” surfaces
biologically plausible vision research. Another related direction is Porat and
Zeevi [16] and Clark and Bovik [17] on the Gabor representation for early
vision. A distributed architecture, made up of multiple spatially and spec-
trally localized RF and defined as Gabor filters, yields an early low-level
representation of the visual input. Watson’s [18] theory of cortex transform
is conceptually modeled after Gabor-type RF, and provides a distributed rep-
resentation in terms of both spatial and spectral localization, and it can be
potentially integrated into a computational model of our proposed theory.
Applications to Symmetry and Structural Regularity. Symmetry is a
much-encountered theme in science. Here, We use the term symmetry in
reference to all kinds of transformations that leave invariant some form of
geometry, together with the related concepts such as harmony. Therefore,
similarity in Euclidean plane geometry is a form of symmetry (in the so-called
conformal geometry, where angles are preserved) although it is not necessarily
a rigid motion. The term quasi-symmetry can be used for perceived regularity
of structure that is compelling in its organization, but fails to be a strict
symmetry in the mathematical sense above. This aspect of our research has
a long-term history in psychology as the investigation of cognitive processes
that underlie human perception of geometric forms, much the same way that
Henri Poincare’ posed in his 19-th Century treatise Science and Hypothesis
and led to his discovery of non-Euclidean worlds long before the Einstein’s
theory of gravity. We believe that our theory to address such issues in the

3 87
Figure 4: Coordinate Curves after foliation
realm of cognitive neuroscience, with computational models to support the
cognitive theories, and investigation of physiological evidence t o establish the
low-level computations of visual, tactile motor and auditory processes that
contribute to our perception of symmetry/regularity in structure. As an
example, our theory predicts that the human perception of symmetry could
be modeled via the mathematical symmetry of the Gestalt of the surface s
(bounding the object) as discussed above. Thus, the rotational symmetry of a
pine cone or a sea urchin does not have a counterpart in its original physical
shape, but it is measurable when we apply the Gestalt theory of surfaces.
Such observations agree with our daily experience, and can be measured in
psychophysics.
REFERENCES
1..Hummel, R., The Scale-SpaceFormulation of Pyramid Data Structures,
PCV88( 107-123).
2..Thurston, M., and Rosenfeld, A., Edge and Curve Detection for Visual
Scene Analysis, TC(20), No. 5, May 1971, pp. 562-569.
S..Burt, P.J., and Adelson, E.H., The Laplacian Pyramid as a Compact
Image Code, Commun(31), No. 4, April 1983, pp. 532-540.
4..Crowley, J.L.[James L.], and Parker, A.C.[Alice C.], A Representation
for Shape Based on Peaks and Ridges in the Difference of Low-Pass Trans-
form, PAMI(6), No. 2, March, 1984, pp. 156-169.
5..Meer, P., Baugher, E.S., and Rosenfeld, A., Frequency Domain Anal-

388
ysis and Synthesis of Image Pyramid Generating Kernels, PAMI(9), No. 4,
July 1987, pp. 512-522.
6..Mallat, S.G., A Theory for Multiresolution Signal Decomposition: The
Wavelet Representation, PAMI(11),No. 7, July 1989, pp.674-693.
7..Pentland, A.P., Fractal-Based Description of Natural Scenes, PAMI(6),
No. 6, November 1984, pp. 661-674.
8..Peleg, S., Naor, J., Hartley, R.L., and Avnir, D., Multiple Resolution
Texture Analysis and classification, PAMI(6), No. 4, July 1984, pp. 518-523.
9..Keller, J.M., Chen, S.S., and Crownover, R.M., Texture Description
and Segmentation through Fractal Geometry, CVGIP(45), No.2, February
1989, pp. 150-166.
lO..Gagalowicz, A., A New Method for Texture Field Synthesis: Some
Applications to the Study of Human Vision, PAMI(3), No. 5 , September
1981, pp. 520-533.
ll..Julesz, B., and Bergen, R., Textons, The Fundamental Elements in
Preattentive Vision and Perception of Textures, Bell System Tech.(62), No.
6, 1983, Part 11, pp. 1619-1645.
12..Caelli, T., Three Processing Characteristics of Visual Texture Seg-
mentation, SV(l), No. 1, 1985, pp. 19-30. BibRef 8500
13..Wilson, R., Bergen, A four mechanism model for spatial vision, Vision
Research, 1979.
14..DeValois et. al, Responses of the striate cortex cells to gratings and
checkerboard patterns, J. Physiology, 1979.
15..Koenderink and van Doorn , Representation of local geometry in the
visual system, Biological Cybernetics , 1987.
16..Porat and Zeevi , The generalized Gabor scheme of image representa-
tion in biological and machine vision, 1988.
17..Clark, M., Bovik, A.C., Experiments in Segmenting Texton Patterns
Using Localized Spatial Filters, PR(22), 1989, pp. 707-717.
18..Watson, A.B., The Cortex Transform: Rapid Computation of Simu-
lated Neural Images, CVGIP(39), No. 3, September 1987, pp. 311-327.

389

Vous aimerez peut-être aussi