Vous êtes sur la page 1sur 10

Submitted to Nature Neuroscience as a review. The uncrowded window. Dec.

12, 2007 Page 1 of 10

how many times you move your eyes back and forth from plus
The uncrowded window to minus, the A comes and goes every time.
Fig. 1 shows that, amid clutter, it is fruitless to
for object recognition integrate features over too large an area. In principle, it would
Denis G. Pelli and Katharine A. Tillman be similarly fruitless to integrate over too small an area,
Psychology & Neural Science, NYU getting only a fraction of the object. This matches some
clinical descriptions of simultanagnosia: “It often appeared as
It is now emerging that vision is usually limited not by if he were looking through a peephole which was too narrow
object size, but by spacing. The visual system to include the entire stimulus”18. While we have no evidence
recognizes an object by detecting and then combining its that this latter problem ever arises in normal observers,
features. When objects are closer together than the crowding affects everyone.
critical spacing, the visual system combines features Crowding, unlike ordinary masking, does not make
from them all, producing a jumbled percept. We review the target disappear16. Crowding impairs our ability to
the explosion of studies of this ‘crowding’ phenomenon identify, count, and locate objects, but does not affect
— in grating discrimination, letter and face recognition, detection (Fig. 2). The notion that crowding is a breakdown of
visual search, and reading — to reveal a universal the second stage of object recognition, after feature detection,
principle: the ‘Bouma law’. Critical spacing is equal for all is supported by experiments showing that crowding can
objects. Furthermore, critical spacing at the cortex is greatly impair the observer’s ability to judge target orientation
independent of position, and critical spacing at the visual without affecting the orientation-specific aftereffect of
field is proportional to distance from fixation. The region adapting to that target19. The sparing of adaptation suggests
where object spacing exceeds critical spacing is the that the impairment of identification occurs after the detection
uncrowded window. Observers cannot recognize objects stage.
that are outside of this window. The uncrowded window Crowding is specified by the observer’s critical
limits how quickly people can read text and find an spacing, which is how far apart (measured center to center)
object in clutter. elements in a scene must be in order to escape integration.
Critical spacing grows in proportion to eccentricity, the
Object recognition means calling a chair a chair, despite distance of the target object from fixation20.
variations in style, viewpoint, rendering, and surrounding
clutter. The first step in object recognition is feature
detection1-3. Features are image components that are detected
independently4, 5. All objects consist of simple features, such
as oriented lines and edges. Cells in the primary visual cortex
respond when such features match their receptive fields, and
the features that drive cells hard enough are detected6. Then
the brain combines the detected features to recognize the
object. This combining step (including “integration”,
“binding”, “segmentation”, “pooling,” “grouping,” “contour
integration”, and “selective attention”) is still mysterious3, 7-12. Fig. 1. An A in chaff. The bars represent elementary visual features.
In particular, it seems that some objects have only one part Fixating close to the bars, at the green plus, it is easy to recognize the
letter A. If you fixate far away, on the red minus, you can still see the
and others have many13-15. An object with one part is features, but you cannot identify the letter. Your visual system is
recognized in a single integration of its constituent features. In integrating over too large an area, including all the features from both
an object with multiple parts, each part must be recognized the A and the surrounding chaff, resulting in a jumbled percept. This is
before they are all combined. This review surveys the crowding. You can rule out acuity (letter size) as an explanation (for
your inability to identify the A) by confirming that you can see the A
evidence for a universal law that describes the limits of object while fixating the minus if your fingers hide the chaff. For a review of
recognition. the evidence that crowding is feature integration over an
16
The bars in the “A in chaff” demonstration (Fig. 1) inappropriately large area, see ref. .
represent elementary features. When you look at the demo,
your brain detects the features and combines them to
categorize the letter as “A”. We cannot yet explain how this
process works, but we can break it, easily. Fix your eyes on
the red minus, far from the A, and the extra features (chaff)
make it impossible to recognize the A. When you fixate so far
from the A, your brain integrates over too large an area around
the A, failing to isolate the relevant features of the A from the
nearby junk, and comes up with a jumbled percept instead of a Fig. 2. Effects of crowding. While fixating the red minus, can you tell
that the clusters differ in letter identity, number, and position?
letter. This is crowding reviewed in 16 and 17. Some well-known Crowding impairs your ability to judge these object properties
20, 21
.
illusions are delicate, strongly affected by expectation, and Using your finger to cover all but the leftmost letter, you can confirm
only work once. Unlike them, crowding is robust. No matter that even this most distant letter is well within your acuity. Reprinted
21
from ref. .
Submitted to Nature Neuroscience as a review. The uncrowded window. Dec. 12, 2007 Page 2 of 10

The Bouma law


Our story, here, is the great generality of a law based on the
simple observation that Herman Bouma reported in 1970: The Fig. 3. Crowding in a word. While fixating the red minus, it is easy to
critical spacing for identification of small letters is roughly identify the isolated letter on the left, but try to identify the middle letter
on the right. It is hard. Fixate the green plus and try again. Now it is
half the eccentricity20. We take this observation to its most 23
easy .
general form, which we call the Bouma law: Our ability to
identify an object among similar objects depends solely on the
ratio of the object spacing to the observer’s critical spacing at
the object’s location. The object is crowded whenever the ratio
is less than one. The critical spacing is independent of what
the object is, and depends only on where the object is in the
visual field and the direction from target object to flanker
object. The broad empirical conformance to this law is
unexpected because object recognition is usually assumed to
be limited by size, not spacing, as discussed below.
Most studies of crowding have used letters and words Fig. 4. Faces are like words. Arnold Schwarzenegger and Elvis
as stimuli. [A current special issue of the Journal of Vision Presley are famous, so their faces may be familiar. Fixate on the red
minus between them. Ignoring their hair, can you still tell who is the
includes more than twenty articles on crowding, using a wide governor and who is the King? How close to each face do you have to
variety of stimuli22.] Fig. 3 demonstrates the critical spacing of fixate to identify it? As you fixate closer and closer to the face, you will
the letters in a word23. If you try to identify the middle letter in find that you remain unable to recognize it until you reach the cheek.
the word “are”, it is easy when you fixate near the letter and Faces are like words; the parts (eyes, nose, mouth) must be isolated
(separated by the observer’s critical spacing) in order for the whole to
hard when you fixate far away. This is because, when fixation be recognized .
15

is far enough away, the whole word falls within one critical
spacing, and features from all the letters are jumbled together. Let us go through this, step by step. By the Bouma
The parts of an object crowd each other when they fall within law, critical spacing is bφ, where b is Bouma’s proportionality
a critical spacing. Faces, like words, are recognized only if the constant between critical spacing and eccentricity φ 27. In V1
visual system can isolate their parts: eyes, nose, mouth15. and many other areas in the visual cortex, eccentricity φ in the
Therefore, we cannot recognize a face unless we look directly visual field is an exponential function of position d (in mm) on
at it (Fig. 4). the cortex, φ=exp(α+βd), where α and β are empirical
Fig. 5 demonstrates that the critical spacing is constants, unique to each cortical area28. So d is (log(φ)-α)/β, a
universal, independent of object and size. The threshold logarithmic map. If the target is at eccentricity φ, then a
eccentricity for recognition is the same for all objects with the flanker one critical spacing farther from fixation will be at
same spacing, even when the objects are as diverse as eccentricity φ+bφ. b and β are fixed constants, so the cortical
gratings, words, faces, and furniture. Similarly, the critical separation Δd = dφ+bφ - dφ = log(1+b)/β is a fixed number of
spacing of crowding is unaffected by equal motion of target mm, independent of target location, in every cortical area that
and flankers24. Across different tasks, including discrimination is logarithmically mapped29, 30. We take b=0.4, as in Fig. 5, so
of size, hue, saturation, and orientation, the amplitude in V1, where β=0.0577/mm, Δd is 6 mm.
(maximum threshold elevation) of crowding varies, but the
spatial extent of crowding is practically the same25. “Second- Size or spacing?
order” letters (painted with texture) are more susceptible to Bouma’s critical spacing idea could not be simpler, but it has
crowding than “first-order” letters (painted with homogeneous been very hard to accept because it displaces a firmly held
ink), but the spatial extent of crowding is the same26. belief that visibility is limited by size (acuity), not spacing
The generality of the Bouma law suggests that (crowding). When we view a scene from farther away, both
critical spacing is a fundamental parameter of human vision. It size and spacing decrease. Viewing distance, per se, does not
depends solely on position and direction in the visual field16, matter. What matters is the stimulus at the retina. Some visual
20
. It is proportional to the distance from fixation (Fig. 6). This tasks are size-limited. The Egyptians (5,000 years ago) and
proportionality is due to the organization of the visual cortex. many since have assessed acuity of vision by the ability to
The known eccentricity-dependence of the cortical distinguish the double star Alcor/Mizar in Ursa Major31.
magnification factor (mm on the cortex per deg of visual Today, to measure a size threshold (acuity) that characterizes a
angle) results in a logarithmic map of the visual field on the person’s vision, we ask the observer to identify a simple
primary visual cortex (V1). The logarithmic transformation of object, usually a letter. This measure is unaffected by
the proportional critical spacing at the visual field results in a crowding if done foveally, where critical spacing is only a few
fixed critical spacing at the cortex (6 mm at V1), independent minutes of arc, or anywhere on a blank field. Measuring acuity
of eccentricity. is useful, especially in selecting the best optical correction.
However, outside of the optometrist’s office, most of us are
well-corrected (20/20) and our ability to see is more limited by
object spacing than size. We can see a bird in the sky without
Submitted to Nature Neuroscience as a review. The uncrowded window. Dec. 12, 2007 Page 3 of 10

noting that the task is easy when you fixate to the right of the ±, and
hard at the left. Critical spacing depends solely on position (and
direction), in the visual field, which does not vary among rows in this
demonstration. Note that halving object size has no effect on critical
32, 33
spacing. Critical spacing is independent of spatial frequency .
[Image sources: Gratings were created in MATLAB. Letters are in the
Courier font. Rocking chair is from hemera.com. Animal silhouettes are
in our Animals font, which is available for research purposes. Men,
34
women, and telephone signs are from aiga.com. House is from ref. .
Ladder, stool, food, and Gandhi are from Google image searches of
the web.]

crowding, but most of our visual world is cluttered, and each


object that we identify must be isolated from the clutter. When
an object is not isolated, it is crowded, and we cannot
recognize it. Isolation depends on spacing, not size. To escape
crowding, the object spacing must exceed the observer’s
critical spacing at that location in the observer’s visual field.
Critical spacing has profound effects on everyday
life. Consider reading. It has long been known that reading
consists of a series of eye fixations, four per second, rather
than a continuous sweep across the text35. Reading rate is
independent of text size over a large range, but drops
precipitously for sufficiently small text. From ancient to
modern times, this has been taken to be a size limit (acuity).
Plato complained that he was asked “to read small letters from
a distance”36. Legge, Pelli, Rubin, and Schleske declared that,
“the fairly rapid decline in reading rate for characters smaller
than 0.3° is undoubtedly associated with acuity limitations”37.
But we were wrong. Reading rate depends on letter spacing,
not size. Measuring with two texts, one widely and one
normally spaced, at various viewing distances, it is found that
reading rate drops at a particular letter spacing (in deg),
independent of letter size38. Typographers routinely increase
“tracking” (spacing) to maintain the legibility of text when it
is made smaller.

Spatial extent of crowding


The invariance of critical spacing demonstrated in Fig. 5 is
found when the target and flankers have similar features (e.g.
black letters flanking a black letter target). These typical cases
produce maximum crowding. Flankers that have features
unlike the target (e.g. white letters flanking a black letter
target, on a gray background) produce much less crowding or
none at all39-41. This weaker effect is usually reported as a
reduction in critical spacing, but perhaps the spatial extent of
crowding is unchanged and the effect is only reduced in
amplitude. The reported reduction of critical spacing may be
an artifact of defining critical spacing by a performance
criterion, as discussed below. Compared to the effect of target-
like flankers, dissimilar flankers may simply have a weaker
effect over the same spatial extent.
Crowding has usually been characterized by just one
number, “critical spacing,” i.e. spacing threshold, the spacing
required to achieve a criterion level of performance. That
Fig. 5. Critical spacing is independent of object and size. Fixating
on the red minus, you will be unable to identify the middle object in the single number seems to be enough to characterize crowding
first ten rows (unless your fingers isolate it by hiding the flanking when the flanker is similar to the target, but may not
objects), or the single object in the last two rows. Grating patches, like adequately describe the weaker crowding produced by
those in the top two rows, are often taken to be one-feature objects. In dissimilar flankers. Disentangling the amplitude and extent of
the first row, is the middle grating vertical or tilted? The ± is our
estimate of the fixation point where you can just barely identify the crowding demands a two-number description. The complete
target. You can assess the accuracy of this threshold estimate by “psychometric function,” plotting proportion correct as a
Submitted to Nature Neuroscience as a review. The uncrowded window. Dec. 12, 2007 Page 4 of 10

function of spacing, tells us little more than the spacing


threshold. Proportion correct has a small dynamic range
bounded by the floor at chance, when spacing is below
threshold, and the ceiling at 100%, when spacing is above
threshold. To get the whole story, we must replace proportion
correct by a better effect scale: threshold. To measure
threshold, one varies a physical parameter of the stimulus to
achieve a particular level of performance. Thus, threshold is
measured on a physical scale with a wide dynamic range. For
example, several studies have measured orientation
discrimination thresholds as a function of spacing. These plots
show that the lesser crowding produced by less-similar
flankers has much less amplitude (maximum threshold
elevation) but practically the same spatial extent33, 40, 41.
Changing the orientation of the flankers from parallel
to orthogonal (to the target) halves the amplitude without
obvious reduction of extent40. Arranging the flankers to form a
closed contour reduces the threshold elevation by a factor of 6
without reducing its spatial extent (defined as spacing for half
maximum log threshold elevation)42. Fig. 6. Critical spacing is proportional to eccentricity. The observer
Crowding diminishes somewhat with practice, but the (PB) fixated the point indicated by a plus in the upper right and
improvement is specific to the trained strings43 and does not identified the orientation of a target T (Right-side up or upside down?)
presented (in blocks) at one of the 9 locations indicated by the dots.
transfer from 3-letter strings to reading44. The benefit of Two flanking Ts were displayed symmetrically displaced from the
practice has been reported as a reduction in critical spacing, target in opposite directions, -45°, 0°, 45°, or 90° relative to horizontal.
but a two-parameter analysis might reveal, as above, that only Each vertex in the roughly elliptical contours represents the measured
critical spacing of the pair of flanking letters for 75% correct
amplitude (not extent) is affected. identification of target orientation. Note that the critical spacing
At present, the simplest account is that the spatial contours are not circles: the direction from target to flanker matters.
These were measured with one letter size at each eccentricity;
extent of crowding for a particular location and direction is 27
changing letter size has no effect on the results . Adapted from ref. .
45

independent of the particular target and flanker. However, that


is only a tentative conclusion because most published studies
have not disentangled the amplitude and extent of crowding.
For the rest of this review, we revert to “critical spacing”,
xuncrowdedx
asking the reader to bear in mind that the special cases we just Fig. 7. What is your uncrowded span? Fixate the “o” in the center of
discussed demand a two-parameter (amplitude and extent) the word. Your uncrowded span is 3 if you can read “row”, 4 for “crow”,
characterization of crowding. 5 for “crowd”, and a whopping 9 for “uncrowded”, which many
observers achieve. For reviews of uncrowded and visual spans see
27, 46, 47 27 48
refs. . Reprinted from ref. ; adapted from ref. .
The uncrowded window
Most of the time, most of our visual field is crowded, sparing
only a central uncrowded window. This window and the
limitation it places on recognition is especially clear in the
case of reading. To read text, we must identify letters. The rate
at which we read depends on how many letters we take in on
each fixation (Fig. 7), which is limited by crowding. The
spacing of letters in text is uniform, but the observer’s critical
spacing increases with distance from fixation. Beyond some
eccentricity, the reader’s critical spacing exceeds the spacing
of the text, and the letters crowd each other, spoiling
recognition. Peripheral vision, beyond that eccentricity, is
crowded. Central vision, within that eccentricity, is
uncrowded. This is the uncrowded window. Inside the
window, letters are uncrowded and we can read. Outside the
Fig. 8. The uncrowded window. This figure simulates crowding in
window, letters are crowded and we cannot. In order to read reading by substituting letters in the peripheral field. Crowding spoils
the letters that lie outside the window, we must move our eyes letter recognition, making reading impossible outside the uncrowded
to bring the letters into the window. (Letters at the ends of window. Note that the substitutions are undetectable when you fixate
the center of the circle. As you read this caption, the words are clear
words are much less crowded23 and have a larger uncrowded and legible near your chosen point of fixation and illegibly crowded
window.) The number of character positions in a line of text beyond that clear region. That central uncrowded field is a window
that fit inside the uncrowded window is the “uncrowded through which we read. This model fits a wide range of existing human
reading rate data, and is confirmed by several experiments that
span”27. measure the effects of simulated crowding. Reprinted from ref. .
27
Submitted to Nature Neuroscience as a review. The uncrowded window. Dec. 12, 2007 Page 5 of 10

Fig. 8 shows the uncrowded window by simulating


crowding in the periphery. The corruptions outside the
uncrowded window are undetectable when fixating the center
of the window. (These are “silent substitutions”49.)
The observer’s critical spacing is the same for all
objects. Together, the observer’s critical spacing and the
spacing of the viewed objects determine the size of the
uncrowded window. Inside the window, we can recognize
objects, and outside of it, we cannot46. When the spacing is
uniform, as in text, then the window will be central, where the
critical spacing is smallest. When spacing is not uniform, the
window need not be central, and there may be more than one.
Many have suggested that a central window (also Fig. 9. Reading rate vs. span. Data are from five studies, for normal
(black solid symbols) and dyslexic readers (red open symbols). The
known as the “span of apprehension,” “visual span,” “visual normal readers are of various ages, from 1st grade (age 6) through
attention span,” “area of focal attention,” “conspicuity area,” adult. Reading rate rises monotonically with age. The dyslexic readers
or “number of elements processed per fixation”) limits reading are all in the 6th or 7th grades. The vertical scale is reading rate (1
or search20, 35, 46, 48, 50-54, reviewed by 27 and 47, but the window size word/min = 0.1 character/s, assuming an average of 5 letters plus a
space for each word). The horizontal scale is letter span, estimated in
has usually been taken to be an independent parameter of various ways. Span is the width (in characters) of the uncrowded
vision. Often it has been supposed that the window size is window. A reader making ρ eye movements per second, advancing an
limited by letter or object size (acuity), sometimes attention. average of u characters per eye movement, reads at a rate r = ρu
Until recently, only Woodworth48 and Bouma50 claimed that character/s. The diagonal line plots this proportionality, assuming four
eye movements per second (ρ = 4 Hz), showing that this simple 4 Hz
the size of the window is set by spacing (crowding). rule gives a fairly good account of all the data from normal readers.
Woodworth and Bouma made good cases against acuity, but 63
Taylor tested thousands of normal students in 1st grade through
failed to convince their colleagues. Subsequent papers cited college. We plot one point per grade, 1-12, plus college. Reading rate
them but persisted in assuming that the window is acuity- (vertical scale): Subjects read age-appropriate paragraphs. Carver
64, Table 2.1
corrected the rates for text difficulty . Span (horizontal scale):
limited. However, recent detailed studies of search and Taylor measured eye movements. We plot Taylor’s “span of
reading vindicate Woodworth and Bouma, showing that the recognition”, the average length of forward saccades (the product of
window is where the object spacing exceeds the critical words per saccade, from Taylor, and characters per word, for text at
61
spacing of crowding27, 29, 46, 55. each difficulty level, from Carver's Eq. 2.2). Kwon et al. tested
normal 3rd, 5th, and 7th graders, and adults (4 points). Reading rate:
Flush with the success of the uncrowded window Subjects read sentences displayed one at a time. Span: Kwon et al.
idea in explaining the reading rate of normal adults27, 47, one measured “visual span profiles”, which trace out the subject's accuracy
wonders whether it can help to explain why children and for identifying a triplet of random letters as a function of position in the
dyslexics read more slowly. Developmental dyslexia is now visual field. We plot the number of letter positions in the visual span
profile (for 1 deg letters) for which the triplet accuracy is at least 80%.
generally thought to be primarily a phonological deficit56, 57, 65
Valdois et al. tested two dyslexic subjects (in the 6th and 7th
but Bouma found that dyslexics have increased crowding58. grades), classified as a surface dyslexic ( ) and a phonological
Fig. 9 plots data from all the studies for which we could dyslexic ( ), and age-matched controls ( ). Reading rate: Valdois
estimate reading rate (vertical scale) as a function of the provided reading rates for ordinary text. Span: They measured
accuracy versus letter position for reporting a string of 5 briefly-
number of characters in the uncrowded window (called “span” presented random letters (their Fig. 1). We plot the number of positions
in the horizontal scale). For all the normal readers (black (out of 5) at which the subject got at least 80% correct. Martelli et al.
62

solid), including adults and children, reading rate is fairly well tested normal and dyslexic 6th graders. Normal adult data were
predicted (diagonal line) by the product of span and the provided separately. Reading rate: The normal and dyslexic children
read ordinary text printed on paper. The adults read 8-letter nouns in
standard 4 Hz rate of fixations. (The prediction has no degrees rapid serial visual presentation. Span: They measured the critical
of freedom.) The large increase in uncrowded span during spacing required to identify the central letter in a triplet of three random
childhood stands in contrast to the small effect of practice on letters with 90% accuracy as a function of eccentricity. They then
critical spacing (and thus uncrowded span) in adults, noted calculated Bouma's factor b (proportionality constant between critical
27
spacing and eccentricity). We plot the uncrowded span u = 1+2/b .
above; this warrants further investigation. The points well 60
Prado et al. tested dyslexics and age-matched controls. (They did
below the line are dyslexics (red open). Most of the dyslexics not report the students' grade level, but average age was 11 years,
have smaller spans than age-matched controls, but they read which is typical for the 6th grade.) Reading rate: They measured eye
much more slowly than predicted by their span. They are all movements as subjects read short passages. We plot rate as the
number of words in the passage divided by the product of the total
well below the normal line, reading at less than half their number of fixations and the mean fixation duration (their Table 2).
span-predicted rate, contrary to the hypothesis that most cases Span: We plot the average number of letters reported correctly from a
of dyslexia are arrested development, with performance like string of 5 briefly presented letters (their Table 1).
that of younger control normals matched for reading level56.
These data indicate that something else, e.g. a phonological
deficit and/or longer fixations59, 60, must account for the rest of
the dyslexic impairment51. However, a part of dyslexia and all
of the normal development of reading rate seem to be
mediated by the uncrowded span51, 60-62.
Submitted to Nature Neuroscience as a review. The uncrowded window. Dec. 12, 2007 Page 6 of 10

The Rey Complex Figure Test is widely used to


assess the ability of neurological patients to copy a line
drawing66. Normal observers doing the copy test with just
their peripheral vision produce drawings similar to those
produced with free viewing by patients with apperceptive
agnosia, a type of object blindness (Fig. 10). Thus, these
drawings suggest that the crowded peripheral vision of normal
observers may be a good model for the central vision of these
object-blind patients.
An object must be in the observer’s uncrowded
window to be seen uncrowded. Inverting the idea, Fig. 11
shows the object’s uncrowded neighborhood, the area,
centered on the object, within which you must fixate to see it
uncrowded (much like Engel’s “conspicuity area”67, though he
did not mention crowding). Only those objects whose
uncrowded neighborhoods include the observer’s point of
fixation are recognizable. If the observer fixates randomly,
then the probability that the fixation will land in a particular
object’s uncrowded neighborhood (and thus that the object can
be recognized) is equal to the fraction of the image area
occupied by the uncrowded neighborhood. In the popular 66
Fig. 10. The Rey figure copying test . The original diagram is on the
children’s book, your chance of finding Waldo in your first left. The drawings on the right were made by normally sighted
glimpse is proportional to the area of his uncrowded graduate students who were asked to copy, from left to right, while
fixating the central +. (Ignore the left-right reversal, which is due to
neighborhood46, 67, 68. ambiguity of the copying instructions.) A neurologist who examined
these drawings found them typical of those produced with unrestricted
Maximum pooling viewing by patients with apperceptive agnosia. Despite the amateur
It seems that combining features to recognize objects carries drawing skill of the students, you can verify that these are reasonably
good copies for your peripheral vision by fixating the central +.
the risk of crowding. Above, we reviewed evidence that the Courtesy of Marialuisa Martelli, Università di Roma “La Sapienza”.
critical spacing is the spatial extent over which features are
integrated, but we have said little about what goes on within
this integration area. Some of the complaints about crowding
— especially the impaired judgment of position and shape —
seem to stem from uncertainty (confusion) about feature
position. One may complain about this uncertainty, but we all
benefit from the positional invariance of recognition: a dog is
a dog regardless of location. Similarly, although the relative
positions of features vary among fonts and handwritings, for a
letter to be read, it must be assigned to the same category, e.g.
“a”, regardless of its rendering.
How are features combined? Three lines of
investigation (psychophysics, physiology, and engineering)
converge on maximum pooling as a key step. In maximum
pooling, many feature detectors with similar receptive fields, Fig. 11. The uncrowded neighborhoods (green circles) of two
differing only in position, all respond to the stimulus objects: water bottle and magazine. You must fixate within the green
independently, but only the maximum detector response, circle to see the object uncrowded. Fixating outside the uncrowded
regardless of detector position, is passed on. This immediate neighborhoods, you cannot recognize either of these objects.
loss of precision of feature position is an important aspect of
psychophysical and physiological models and engineering maximum pooling6, 75. In practical engineering, some of the
solutions for object recognition69-71. Psychophysically, human most successful machine classifiers of handwritten digits (and
observers act as though they are always considering many other objects) use maximum pooling to tolerate “deformations
possible positions, like the ideal observer for an uncertain and shifts in position”76-78. In all these cases, invariance of
signal, which does maximum pooling72-74. Fig. 12 allows you object recognition is achieved by maximum pooling, which
to witness this vagueness of feature position. Physiologically, results in uncertainty of feature location, which, in turn, limits
in the primary visual cortex, complex cell responses are the precision of judgments of object position and shape.
position invariant and do not summate, consistent with
Submitted to Nature Neuroscience as a review. The uncrowded window. Dec. 12, 2007 Page 7 of 10

+
Fig. 12. Experience the vagueness of feature position predicted by maximum pooling. Viewing from 17 cm (though distance hardly matters), fixate the
79
K
plus. The letter (3.3° at an eccentricity of 46°) is too small to recognize, and looks like “a jumble of lines or an unorganized heap of marks” . Optical blur
is noticeable, but does not prevent you from seeing the lines. The feature position errors are so large that you see only a jumble of floating features. One
observer said, “I see something that appears to be composed of straight lines about half an inch high. Could be a drawing. Could be a letter or letters. I
cannot see clearly what it is. At the moment it looks like a capital Y, but it's undefinable. The lines are not precise. They appear to be shimmering, fading
in and out. Very unstable figure.” Such confusion of feature position is predicted by maximum pooling. We think that maximum pooling not only
contributes to crowding, but also limits acuity, as shown here. The uncertainty of feature position seems to be a fixed fraction of the area of integration.
For example, the just noticeable difference in position of a grating patch (not shown) is independent of the spatial frequency of the grating and is about
80
4% of whichever is larger: the extent of the grating or 50% of the eccentricity . We think that this uncertainty contributes to crowding, but its spatial
extent is much too small to be the main cause of crowding.
structure. You see in that region … no clearly segregated,
Discussion countable, or, above all, individually identifiable component
Typically, only a small portion of the visual field falls within parts, such as actual stripes, spikes, knots, holes, and the
the uncrowded window. Most of our visual field is peripheral like”81. In 1976, Jerry Lettvin adds, “Things are less distinct as
and crowded, so object recognition fails. If we cannot they lie farther from my gaze. It is not as if things there go out
recognize things in this part of our vision, what do we see? We of focus … it’s as if somehow they lost the quality of
see stuff (unnamed texture) and perceive space (shape of the ‘form’”85.
scene we are in). With an effort, observers can name and In everyday life, most of the things we recognize are
describe texture, but this rarely happens. Texture includes susceptible to crowding (by surrounding clutter) or self-
variations of color, depth, and motion10, 81. Many of the cues to crowding (among the parts). We see these things through a
depth—binocular disparity, motion parallax, scale gradients, keyhole: the uncrowded window. Rates for reading and
shape from shading—seem to be immune to crowding. A searching are proportional to the size of this window. We talk
sense of space is particularly important for mobility, which is about and remember the things we identify. The rest of our
greatly impaired by tunnel vision of 20 deg or less82, 83. visual field is crowded, does not recognize or name things,
Location of fixation affects perception of texture much less and is hardly ever mentioned, but it does allow us to see
than it affects perception of objects (Fig. 13)84. unnamed stuff and space.
The uncrowded window and crowded surrounding
field follow a long tradition of visual dichotomies (direct vs.
indirect, foveal vs. peripheral, inside vs. outside the spotlight
of attention, focal vs. ambient, sustained vs. transient, what vs.
where, perception vs. action, ventral vs. dorsal) that
distinguish two kinds of vision: one typically central, acute,
and conscious, which recognizes and names objects; the other
typically peripheral, indistinct (“blurry,” “vague,” “fuzzy,”
“uncertain,” “confused,” “jumbled”), and unconscious, which
cannot recognize or name objects, but helps guide movement.
Unlike the foveal/peripheral border, which is at a fixed 1 deg
eccentricity, the eccentricity of the crowded/uncrowded border
depends on object spacing. One sign of conscious awareness is
reporting what we see, which is much harder when object
recognition fails, leaving only unnamed texture. This may be
why peripheral vision and crowded viewing are so rarely
described in any context. Acuity and other measures have
been graphed as a function of eccentricity, but there are very
few published descriptions of the everyday experience of
crowded viewing.
In 1936, the Gestalt psychologist Wolfgang Metzger
described crowded viewing and texture, “Farther out [in the
periphery], the structure becomes ever weaker and cruder … Fig. 13. A forest. This is mostly texture, with very few recognizable
The unifying effect of proximity becomes overwhelming. … objects. Unlike perception of objects, the perception of texture is little
84
affected by the location of fixation . We suggest that one might define
[D]ifferences … cause an imbalance and restlessness in each “texture” as what one can see without object recognition. Reprinted
intrafigural organization that is difficult to describe and can 86
from ref. .
best be compared with what, in clearly seen objects, is called
their grain, or texture, the material nature of their perceived
Submitted to Nature Neuroscience as a review. The uncrowded window. Dec. 12, 2007 Page 8 of 10

Diverse studies of crowding come together to reveal 11. Motter, B.C. Crowding and object integration within the
a single story. Crowding is feature integration over an receptive field of V4 neurons. Journal of Vision 2(7)274,
inappropriately large area (Fig. 1). Object recognition usually 247a (2002).
is limited by spacing, not size. To be identified, simple objects 12. Ledgeway, T., Hess, R.F. & Geisler, W.S. Grouping local
must be separated by at least the observer’s critical spacing, orientation and direction signals to extract spatial
which corresponds to 6 mm at the primary visual cortex. contours: Empirical tests of "association field" models of
Compound objects, such as words and faces, can crowd contour integration. Vision research 45, 2511-2522
themselves. Their parts must be separated by at least the (2005).
critical spacing. Thus, in our cluttered world, observers can 13. Biederman, I. Recognition-by-components: a theory of
only identify objects in an uncrowded window, determined by human image understanding. Psychol Rev 94, 115-147
the object spacing. When the spacing is uniform, as in text, (1987).
then the window will be central, where the critical spacing is 14. Tanaka, J.W. & Farah, M.J. Parts and wholes in face
smallest. These conclusions all spring from the observation recognition. Q J Exp Psychol A 46, 225-245 (1993).
that critical spacing depends solely on location and direction, 15. Martelli, M., Majaj, N.J. & Pelli, D.G. Are faces
which we call the Bouma law. processed like words? A diagnostic test for recognition by
parts. Journal of Vision 5(1)6, 58-70 (2005).
ACKNOWLEDGMENTS 16. Pelli, D.G., Palomares, M. & Majaj, N.J. Crowding is
Thanks to Lyuba Azbel, Diana Balmori, Marie-line Bosse, Herman unlike ordinary masking: distinguishing feature
Bouma, Marisa Carrasco, Rama Chakravarthy, Susana Chung, integration from detection. Journal of Vision 4(12)12,
Hannes Famira, Jeremy Freeman, Hugh Hardy, Lloyd Kaufman,
MiYoung Kwon, Michael Landy, Yann LeCun, Dennis Levi, 1136-1169 (2004).
Larry Maloney, Flavia Mancini, Marialuisa Martelli, Tony 17. Levi, D.M. Crowding: A mini-review. Vision research
Movshon, Cesar Pelli, Rafael Pelli, Elizabeth Segal, Lothar (in press).
Spillman, Jordan Suchow, Sylviane Valdois, Nick Wade, and, 18. Trobe, J.R. & Bauer, R.M. Seeing but not recognizing.
especially, Bart Farell, Rob Fergus, David Heeger, Brad Motter, Survey of ophthalmology 30, 328-336 (1986).
and Jamie Radner for helpful comments. Supported by National 19. He, S., Cavanagh, P. & Intriligator, J. Attentional
Eye Institute grant R01-EY04432 to Denis Pelli.
resolution and the locus of visual awareness. Nature 383,
COMPETING INTERESTS STATEMENT
334-337 (1996).
The authors declare no competing financial interests. 20. Bouma, H. Interaction effects in parafoveal letter
recognition. Nature 226, 177-178 (1970).
1. Barlow, H.B. Summation and inhibition in the frog's 21. Freeman, J. & Pelli, D.G. An escape from crowding.
retina. J Physiol 119, 69-88 (1953). Journal of Vision 7(2)22 (2007).
2. Neisser, U. Cognitive Psychology (Appleton-Century- 22. Pelli, D.G., Cavanagh, P., Desimone, R., Tjan, B.S. &
Crofts, New York, 1967). Treisman, A.M. Crowding: Including illusory
3. Treisman, A.M. & Gelade, G. A feature-integration conjunctions, surround suppression, and attention.
theory of attention. Cognit Psychol 12, 97-136 (1980). Journal of Vision 7(2)i, 1 (2007).
4. Robson, J.G. & Graham, N. Probability summation and 23. Bouma, H. Visual interference in the parafoveal
regional variation in contrast sensitivity across the visual recognition of initial and final letters of words. Vision
field. Vision research 21, 409-418 (1981). research 13, 767-782 (1973).
5. Pelli, D.G., Burns, C.W., Farell, B. & Moore-Page, D.C. 24. Bex, P.J., Dakin, S.C. & Simmers, A.J. The shape and
Feature detection and letter identification. Vision research size of crowding for moving targets. Vision research 43,
46, 4646-4674 (2006). 2895-2904 (2003).
6. Hubel, D.H. & Wiesel, T.N. Receptive fields, binocular 25. van den Berg, R., Roerdink, J.B.T.M. & Cornelissen,
interaction and functional architecture in the cat's visual F.W. On the generality of crowding: Visual crowding in
cortex. J Physiol 160, 106-154 (1962). size, saturation, and hue compared to orientation. Journal
7. Desimone, R. & Duncan, J. Neural mechanisms of of Vision 7(2)14, 1-11 (2007).
selective visual attention. Annual review of neuroscience 26. Chung, S.T.L., Li, R.W. & Levi, D.M. Crowding between
18, 193-222 (1995). first- and second-order letter stimuli in normal foveal and
8. Prinzmetal, W. Visual feature integration in a world of peripheral vision. Journal of Vision 7(2)10, 1-13 (2007).
objects. Current Directions in Psychological Science 4, 27. Pelli, D.G., et al. Crowding and eccentricity determine
90-94 (1995). reading rate. Journal of Vision 7(2)20 (2007).
9. Ullman, S. High-level vision : object recognition and 28. Larsson, J. & Heeger, D.J. Two retinotopic visual areas in
visual cognition (MIT Press, Cambridge, Mass., 2000). human lateral occipital cortex. J Neurosci 26, 13128-
10. Parkes, L., Lund, J., Angelucci, A., Solomon, J.A. & 13142 (2006).
Morgan, M. Compulsory averaging of crowded 29. Motter, B.C. & Simoni, D.A. The roles of cortical image
orientation signals in human vision. Nat Neurosci 4, 739- separation and size in active visual search performance.
744 (2001). Journal of Vision 7(2)6, 1-15 (2007).
Submitted to Nature Neuroscience as a review. The uncrowded window. Dec. 12, 2007 Page 9 of 10

30. Levi, D.M., Klein, S.A. & Aitsebaomo, A.P. Vernier 49. Pelli, D.G. & Tillman, K.A. Parts, wholes, and context in
acuity, crowding and cortical magnification. Vision reading: a triple dissociation. PLoS ONE 2, e680 (2007).
research 25, 963-977 (1985). 50. Bouma, H. Visual search and reading: eye movements
31. Wade, N.J. Image, eye, and retina (invited review). and functional visual field: a tutorial review. in Attention
Journal of the Optical Society of America 24, 1229-1249 and performance VII (ed. J. Requin) (Erlbaum, Hillsdale,
(2007). NJ, 1978).
32. Chung, S.T.L., Levi, D.M. & Legge, G.E. Spatial- 51. Bosse, M.L., Tainturier, M.J. & Valdois, S.
frequency and contrast properties of crowding. Vision Developmental dyslexia: the visual attention span deficit
research 41, 1833-1850 (2001). hypothesis. Cognition 104, 198-230 (2007).
33. Levi, D.M., Hariharan, S. & Klein, S.A. Suppressive and 52. McConkie, G.W. & Rayner, K. The span of the effective
facilitatory spatial interactions in peripheral vision: stimulus during a fixation in reading. Perception and
peripheral crowding is neither size invariant nor simple Psychophysics 17, 578-586 (1975).
contrast masking. Journal of Vision 2(2)3, 167-177 53. Engbert, R., Nuthmann, A., Richter, E.M. & Kliegl, R.
(2002). SWIFT: a dynamical model of saccade generation during
34. Snodgrass, J.G. & Vanderwart, M. A standardized set of reading. Psychol Rev 112, 777-813 (2005).
260 pictures: norms for name agreement, image 54. O'Regan, J.K. Eye movements and reading. Reviews of
agreement, familiarity, and visual complexity. J Exp oculomotor research 4, 395-453 (1990).
Psychol [Hum Learn] 6, 174-215 (1980). 55. Motter, B.C. & Belky, E.J. The guidance of eye
35. Huey, E.B. The psychology and pedagogy of reading movements during active visual search. Vision research
(Macmillan, New York, 1908). 38, 1805-1815 (1998).
36. Plato. Republic, 2, line 368d (Hackett Publishing Co., 56. Stanovich, K.E., Siegel, L.S. & Gottardo, A. Converging
Indianapolis, 1992). evidence for phonological and surface subtypes of reading
37. Legge, G.E., Pelli, D.G., Rubin, G.S. & Schleske, M.M. disability. Journal of Educational Psychology 89, 114-
Psychophysics of reading--I. Normal vision. Vision 127 (1997).
research 25, 239-252 (1985). 57. Snowling, M.J. From language to reading and dyslexia.
38. Levi, D.M., Song, S. & Pelli, D.G. Amblyopic reading is Dyslexia 7, 37-46 (2001).
crowded. Journal of Vision 7(2)21 (2007). 58. Bouma, H. & Legein, C.P. Foveal and parafoveal
39. Kooi, F.L., Toet, A., Tripathy, S.P. & Levi, D.M. The recognition of letters and words by dyslexics and by
effect of similarity and duration on spatial interaction in average readers. Neuropsychologia 15, 69-80 (1977).
peripheral vision. Spat Vis 8, 255-279 (1994). 59. Hutzler, F. & Wimmer, H. Eye movements of dyslexic
40. Andriessen, J.J. & Bouma, H. Eccentric vision: adverse children when reading in a regular orthography. Brain
interactions between line segments. Vision research 16, and language 89, 235-242 (2004).
71-78 (1976). 60. Prado, C., Dubois, M. & Valdois, S. The eye movements
41. Wilkinson, F., Wilson, H.R. & Ellemberg, D. Lateral of dyslexic children during reading and visual search:
interactions in peripherally viewed texture arrays. J Opt Impact of the visual attention span. Vision research 47,
Soc Am A 14, 2057-2068 (1997). 2521-2530 (2007).
42. Livne, T. & Sagi, D. Configuration influence on 61. Kwon, M., Legge, G.E. & Dubbels, B.R. Developmental
crowding. Journal of Vision 7(2)4, 1-12 (2007). changes in the visual span for reading. Vision research
43. Huckauf, A. & Nazir, T.A. How odgcrnwi becomes 47, 2889-2900 (2007).
crowding: Stimulus-specific learning reduces crowding. 62. Martelli, M., Di Filippo, G., Spinelli, D. & Zoccolotti, P.
Journal of Vision 7(2)18, 1-12 (2007). Crowding, reading, and developmental dyslexia.
44. Chung, S.T.L. Learning to identify crowded letters: Does Perception 35, supp, 174 (2006).
it improve reading speed? Vision research 47, 3150-3159 63. Taylor, S.E. Eye movements in reading: Facts and
(2007). fallacies. American Educational Research Journal 2(4),
45. Toet, A. & Levi, D.M. The two-dimensional shape of 187-202 (1965).
spatial interaction zones in the parafovea. Vision research 64. Carver, R.P. Reading rate : a review of research and
32, 1349-1357 (1992). theory (Academic Press, San Diego, 1990).
46. Vlaskamp, B.N., Over, E.A. & Hooge, I.T. Saccadic 65. Valdois, S., et al. Phonological and visual processing
search performance: the effect of element spacing. deficits can dissociate in developmental dyslexia:
Experimental brain research. Experimentelle Evidence from two case studies. Reading and Writing 16,
Hirnforschung 167, 246-259 (2005). 541-572 (2003).
47. Legge, G.E. Psychophysics of reading in normal and low 66. Rey, A. L'examen psychologique dans les cas
vision (Erlbaum, Mahwah, NJ, 2007). d'encephalopathie traumatique. Archives de Psychologie
48. Woodworth, R.S. Experimental psychology (Holt, New 28, 286-340 (1941).
York, 1938). 67. Engel, F.L. Visual conspicuity, visual search and fixation
tendencies of the eye. Vision research 17, 95-108 (1977).
Submitted to Nature Neuroscience as a review. The uncrowded window. Dec. 12, 2007 Page 10 of 10

68. Handford, M. Where's Waldo? (Little Brown, Boston,


1987).
69. Baldassi, S., Megna, N. & Burr, D.C. Visual clutter
causes high-magnitude errors. PLoS biology 4, e56
(2006).
70. Movshon, J.A., Thompson, I.D. & Tolhurst, D.J.
Receptive field organization of complex cells in the cat's
striate cortex. J Physiol (Lond) 283, 79-99 (1978).
71. LeCun, Y., Bottou, L., Bengio, Y. & Haffner, P.
Gradient-based learning applied to document recognition.
Proceedings of the IEEE 86, 2278-2324 (1998).
72. Shaw, M.L. Attending to multiple sources of information:
I. The integration of information in decision making.
Cognitive Psychology 14, 353-409 (1982).
73. Pelli, D.G. Uncertainty explains many aspects of visual
contrast detection and discrimination. J Opt Soc Am A 2,
1508-1532 (1985).
74. Tjan, B.S. & Nandy, A.S. Classification images with
uncertainty. Journal of Vision 6(4)8, 387-413 (2006).
75. Lampl, I., Ferster, D., Poggio, T. & Riesenhuber, M.
Intracellular measurements of spatial integration and the
MAX operation in complex cells of the cat primary visual
cortex. J Neurophysiol 92, 2704-2713 (2004).
76. Fukushima, K. & Miyake, S. Neocognitron - A new
algorithm for pattern-recognition tolerant of deformations
and shifts in position. Pattern Recognition 15, 455-469
(1982).
77. Serre, T., Wolf, L., Bileschi, S., Riesenhuber, M. &
Poggio, T. Robust object recognition with cortex-like
mechanisms. IEEE transactions on pattern analysis and
machine intelligence 29, 411-426 (2007).
78. Ranzato, M.A., Huang, F.-J., Boureau, Y.-L. & LeCun, Y.
Unsupervised learning of invariant feature hierarchies
with applications to object recognition. Proc. Computer
Vision and Pattern Recognition Conference (CVPR'07),
1-8 (2007).
79. Zigler, M.J., Cook, B., Miller, D. & Wemple, L. The
perception of form in peripheral vision. American Journal
of Psychology 42, 246-259 (1930).
80. Levi, D.M. & Tripathy, S.P. Localization of a peripheral
patch: the role of blur and spatial frequency. Vision
research 36, 3785-3803 (1996).
81. Metzger, W. Laws of seeing (MIT Press, Cambridge, MA,
2006).
82. Lévy-Schoen, A. Exploration et connaissance de l'espace
visuel sans vision périphérique. Le Travail Humain 39,
63-72 (1976).
83. Faye, E. Clinical Low Vision (Little Brown, Boston,
1984).
84. Braun, J. & Sagi, D. Texture-based tasks are little affected
by second tasks requiring peripheral or central attentive
fixation. Perception 20, 483-500 (1991).
85. Lettvin, J.Y. On seeing sidelong. N. Y. Acad. Sci. 16, 10-
20 (1976).
86. Turner, E.H. Ray K. Metzker landscapes (Aperture
Foundation, New York, 2000).

Vous aimerez peut-être aussi