CMOS Imagers Thesis (Linkoping)

Linkping Studies in Science and Technology Thesis No.
1182
Topics on CMOS Image Sensors

Leif Lindgren
LiU-TEK-LIC-2005:37 Department of Electrical Engineering Linkpings universitet, SE-581 83 Linkping, Sweden Linkping 2005
Topics on CMOS Image Sensors ISBN 91-85299-91-X ISSN 0280-7971 Printed in Sweden by UniTryck, Linkping July 2005
Abstract
Today there exist several applications where a real visible scene needs to be sampled to electrical signals, e.g., video cameras, digital still cameras, and machine vision systems. Since the 1970s charge-coupled device (CCD) sensors have primarily been used for this task, but during the last decade CMOS image sensors have become more and more popular. The demand for image sensors has lately grown very rapidly due to the increased market for, e.g., digital still cameras and the integration of image sensors in mobile phones. The rst out of three included papers presents a programmable multiresolution machine vision sensor with on-chip image processing capabilities. The sensor comprises an innovative multiresolution sensing area, 1536 A/D converters, and a SIMD array of 1536 bit-serial processors with corresponding memory. The SIMD processor array can deliver more than 100 GOPS sustained and the onchip pixel-analyzing rate can be as high as 4 Gpixels/s. The sensor is intended for high-speed multisense imaging where, e.g., color, greyscale, internal material light scatter, and 3-D proles are captured simultaneously. Experimental results showing very good image characteristics and a good digital to analog noise isolation are presented. The second paper presents a mathematical analysis of how temporal noise is transformed by quantization. A new method for measuring temporal noise with a low-resolution ADC and then accurately refer it back to the input of the ADC is shown. The method is, for instance, applicable to CMOS image sensors where photon shot noise is commonly used for determining conversion gain and quantum efciency. Experimental tests have been carried out using the above mentioned sensor, which has an on-chip ADC featuring programmable gain and offset. The measurements verify the analysis and the method.
i
ii
Abstract
The third paper presents a new column parallel ADC architecture, named simultaneous multislope, suitable for array implementations in, e.g., CMOS image sensors. The simplest implementation of the suggested architecture is almost twice as fast as a conventional slope ADC, while it requires only a small amount of extra circuitry. Measurements have been performed using the above mentioned sensor, which implements parts of the proposed ADC. The measurements show good linearity and verify the concept of the new architecture.
Preface
Most of the research for this thesis was conducted in 1999 through 2003, while I was working for IVP Integrated Vision Products AB (now SICK IVP AB). The company develops and provides high end machine vision systems consisting of hardware, in the form of smart cameras, and software packages. The main research project during this time was the development of the smart vision sensor M12, containing an image sensor, parallel A/D conversion, and a SIMD processor with corresponding memory on a single die. Seen from a system level this sensor is a further development of earlier sensors from IVP. However, from a technology point of view it is a completely new design where almost all circuit solutions are new compared to the earlier sensors. This required a major research and development effort which was started in 1999 and initially conducted by Lic. Eng. Johan Melander and myself. In the middle of 2000 the actual design process was started and Robert Johansson and later on Bjrn Mller joined the development of the new sensor. Towards the end of the project Calle Bjrnlert, Jrgen Hauser, and Magnus Engstrm joined the team to speed up the development of the digital control logic. Furthermore, Dr. J. Jacob Wikner was a great resource in the development of the on-chip DAC. The sensor was taped out in January 2002 and worked perfectly at rst silicon. It is currently found in several SICK IVP products and will, most likely, be used in several future products. This thesis includes three papers, where the rst paper describes the M12 sensor. The second paper is a result of the temporal noise measurements performed on the M12 sensor. The measurements showed that the temporal noise, for low illumination levels, was much lower than the resolution of the on-chip A/D conversion. This made it impossible to accurately refer the measured noise back to the input of the ADCs using a traditional method. This lead to the development of the new method for elimination of quantization effects in measured temporal
iii
iv
Preface
noise, which makes it possible to accurately determine ADC input referred temporal noise even if it is much lower than the resolution of the ADC. This new method is described in the second paper. The third paper presents a new ADC architecture that can be seen as a further development of the single slope ADC architecture used in the M12 sensor. I would like to thank the aforementioned persons for their contributions in the design of the M12 sensor. Johan Melander, Robert Johansson, and Bjrn Mller are further acknowledged for many fruitful discussions and for all their comments regarding this thesis. All employees at IVP are acknowledged for making it a fun and interesting company to work for. Furthermore, I would like to thank Professor Christer Svensson for his contributions in the form of examiner of this thesis. I would like to thank Tina, Emmy, and Amanda for making my life fun and interesting. I also thank my sister Karin, and my mom and dad for always being there.
Contents
Chapter 1. 1.1
Introduction
1 2 5 6 8 8 10 12 12 14
History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CMOS Image Sensor Circuitry
Chapter 2. 2.1 2.2
Photodetectors . . . . . . . . . . . . . . . . . . . . . . . . . . . Pixel and Readout Circuits . . . . . . . . . . . . . . . . . . . . 2.2.1 2.2.2 2.2.3 2.2.4 Passive Pixels . . . . . . . . . . . . . . . . . . . . . . . Active Pixels . . . . . . . . . . . . . . . . . . . . . . . Logarithmic Pixels . . . . . . . . . . . . . . . . . . . . Global Shutter . . . . . . . . . . . . . . . . . . . . . .
2.3
ADC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Paper I. A Multiresolution 100-GOPS 4-Gpixels/s Programmable Smart Vision Sensor for Multisense Imaging 17 1 2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . Multisense Imaging . . . . . . . . . . . . . . . . . . . . . . . .
v
19 20
vi
Contents
Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 3.2 3.3 3.4 System Level . . . . . . . . . . . . . . . . . . . . . . . Multiresolution Sensor Area and Analogue Readout . . A/D Conversion . . . . . . . . . . . . . . . . . . . . . Processor and Registers . . . . . . . . . . . . . . . . .
22 22 23 23 25 26 26 27 28 28 29 31 38 41
Circuit Implementation . . . . . . . . . . . . . . . . . . . . . . 4.1 4.2 4.3 4.4 4.5 Mixed-Signal Aspects . . . . . . . . . . . . . . . . . . Large Distance Signalling . . . . . . . . . . . . . . . . GLU, COUNT, and GOR . . . . . . . . . . . . . . . . . Pixels and Analogue Readout . . . . . . . . . . . . . . A/D Conversion . . . . . . . . . . . . . . . . . . . . .
5 6 7
Experimental Results . . . . . . . . . . . . . . . . . . . . . . . Comparison With Other Sensors . . . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . Elimination of Quantization Effects in Measured Temporal
Paper II. Noise 1 2 3 4 5 6 7
47 48 50 52 54 55 58 59
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . Quantization Transformation . . . . . . . . . . . . . . . . . . . Uniformly Distributed Fractional Part . . . . . . . . . . . . . . Achieving Uniform Distribution . . . . . . . . . . . . . . . . . Experimental Results . . . . . . . . . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . Acknowledgment . . . . . . . . . . . . . . . . . . . . . . . . .
Contents
vii
Paper III. A New Simultaneous Multislope ADC Architecture for Array Implementations 61 1 2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 2.2 3 4 Using Current Steering DACs . . . . . . . . . . . . . . Fast Pseudo Conversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 64 65 66 68 69 69 70 74
ADC Characterization
Experimental Test . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 4.2 Setup and Calculations . . . . . . . . . . . . . . . . . . Results . . . . . . . . . . . . . . . . . . . . . . . . . .
Conclusion and Discussion . . . . . . . . . . . . . . . . . . . .
Chapter 1
Introduction
Today there exist several applications where a real visible scene needs to be sampled to electrical signals, e.g., video cameras, digital still cameras, and machine vision systems. Since the 1970s charge-coupled device (CCD) sensors have primarily been used for this task, but during the last decade CMOS image sensors have become more and more popular. The demand for image sensors has lately grown very rapidly due to the increased market for, e.g., digital still cameras and the integration of image sensors in mobile phones. A CMOS image sensor is a chip that converts incoming light to electrical signals, and it is made in a complementary metal oxide semiconductor (CMOS) process. Fig. 1.1 shows a cross section of a transistor in a CMOS chip. If a CMOS chip is packaged in a way that permits light to hit the chip, the light will penetrate the transparent silicon dioxide and hit the parts of the silicon substrate that are not covered by the metal layers. If the incoming photon has greater energy than the band gap energy of silicon, which is 1.124 eV and corresponds to a wavelength less than 1.1 m, it can excite one of the valence electrons in a silicon atom and, thereby, move it into the conduction band. This is according to the photoelectric effect for which Albert Einstein received the 1921 Nobel prize for physics. It works the same way as in a normal lm used in a conventional camera; e.g., for a black and white lm a photon hits, typically, a bromide atom and an electron is set free. This electron is then picked-up by a positive charged silver atom and when the lm is processed the silver atoms are grouped together. In a CMOS image sensor there are three different ways of separating and collecting the photo
1
Chapter 1. Introduction
an incomig photon polysilicon gate metal layer 1 via1 via2
metal layer 2
transparent and insulating silicon dioxide an n-type transistor n-doped diffusion, the source channel of the transistor n-doped diffusion, the drain channel of the transistor p-doped silicon substrate connected to GND
Fig. 1.1: Cross section of an n-type transistor in a CMOS process. generated electron-hole pairs: by using an array of photodiodes, photogates or phototransistors. CMOS image sensors have several advantages when compared to CCDs. These advantages include low power consumption, low operating voltage, integrating circuitry on-chip, random access to image data, high-speed, and potentially lower costs [1]. Since the CMOS process makes it possible to include other circuits than the image sensor array on a chip it is possible to do complete imaging systems on one single chip. Analog-to-digital conversion can be integrated either by having, e.g., one fast analog-to-digital converter (ADC), a column parallel ADC [2] or an ADC in each pixel [3]. Color information can be captured by having a color mosaic lter mounted on top of the chip and the color interpolation can be done on-chip. Furthermore, memory, control circuitry, and, e.g., image compression can be implemented on-chip. These things make it possible to build a so called camera-on-a-chip. Fig. 1.2 shows a typical CMOS image sensor with control circuitry, a row decoder, column readout ampliers, ADC, and a 2-D array of pixels each containing a photodetector.
1.1 History
In the 1960s several research groups worked on image sensors using NMOS, PMOS or Bipolar technologies [4]. The rst report on CCD sensors came in 1970. Since CCDs offered much lower xed pattern noise (FPN) and smaller
1.1 History
Pixel array
Ctrl Clk
Row decoder
Ctrl
Readout amps
ADC Digital out
Fig. 1.2: A CMOS image sensor with on-chip ADC and control circuitry. pixels than MOS sensors this new technology was adopted by the industry, and little research was done on MOS image sensors during the 1970s and 1980s. However, some research on MOS image sensors was still conducted. During the early 1980s Hitachi and Matsushita developed MOS sensors, but later they abandoned this research. For more information about the early history of MOS image sensors see [4]. In the late 1980s and early 1990s CMOS image sensors with on-chip ADC and image processing capabilities, e.g., PASIC [2] and MAPP2200 [5], were developed at Linkpings universitet and at the company IVP Integrated Vision Products AB in Sweden. Also in the late 1980s research on CMOS image sensors was done at University of Edinburgh in Scotland and the company VLSI Vision Limited (VVL) was founded. In the rst half of the 1990s Jet Propulsion Laboratory (JPL) in the USA developed CMOS sensors using active pixels and Photobit Corporation was founded. These efforts led to major advances in CMOS image sensor technology. During the second half of the 1990s research on CMOS image sensors was performed at many universities, and many CMOS image sensor companies were founded. These startups were fabless and used silicon foundries with standard CMOS processes for manufacturing their sensors. However, the growing market segment of image sensors have made many of the big IC manufacturing companies to set up their own CMOS image sensor development. This have been done either by own research efforts, by the acquirement of image sensor companies, or by a combination of both. Examples of larger manufacturers acquiring image sensor companies are STMicroelectronics who bought VVL, and Micron Technology who bought Photobit. Other examples are Cypress Semicondutcor Corporation who bought FillFactory (a Belgian company spun-off from sensor research
Chapter 1. Introduction
at IMEC), and Agilent Technologies who bought Pixel Devices International. An advantage for these companies is that they have their own manufacturing, making it possible to tune the CMOS processes for image sensing. This can, e.g., include tailoring junction depths and doping levels, but also adding processing steps for, e.g., buried photodiodes and color lter and micro lens deposition. Today also silicon foundries like, e.g., TSMC, UMC, and Tower Semiconductor offer CMOS processes tuned for image sensing.
References
[1] H-S. Wong, Technology and Device Scaling Considerations for CMOS Imagers, IEEE Trans. Electron Devices, vol. 43, no. 12, Dec. 1996. [2] K. Chen, M. Afghani, P-E. Danielsson, and C. Svensson, PASIC: A Processor-A/D converter-Sensor Integrated Circuit, Proc. IEEE Int. Symp. Circuits and Systems (ISCAS90), vol. 3, pp. 1705-1708, May 1990. [3] B. Fowler, A. El Gamal, and D. Yang, Techniques for Pixel Level Analog to Digital Conversion, Aerosense, 1998. [4] E. R. Fossum, CMOS Image Sensors: Electronic Camera-On-A-Chip, IEEE Trans. Electron Devices, vol. 44, no. 10, Oct. 1997. [5] R. Forchheimer, P. Ingelhag, and C. Jansson, MAPP2200 - a second generation smart optical sensor, Proc. SPIE, Image Processing and Interchange: Implementation and Systems, vol. 1659, pp. 2-11, 1992.
Chapter 2
CMOS Image Sensor Circuitry

Fig. 2.1 shows a typical CMOS image sensor with control circuitry, row select logic, a 2-D array of pixels, readout ampliers, column parallel ADC, and column select logic. Most CMOS image sensors use photodiodes in integrating mode. This means the pixels integrate light during a certain amount of time referred to as the integration time or exposure time. For sensors with a global shutter all pixels are typically rst reset, then they integrate light and then the shutter mechanism moves the integrated charge to a storage node in each pixel. After this the pixels are scanned row by row. This is done by connecting all pixels in a row to the vertical column buses, see Fig. 2.1. At the bottom of these column buses are read ampliers which senses the signals from the pixels. In Fig. 2.1 the output from each readout amplier connects to an ADC, i.e., the ADC is here column parallel. The digital outputs from the ADCs are then scanned by the column select logic and sent off chip. The column select logic, and also the row select logic, can e.g. be a shift register or a counter feeding a decoder. Having counters and decoders enables windowing, i.e., where only a part of the entire array is read, and also different schemes of subsampling. Most sensors do, however, not feature a global shutter. Instead an electronic rolling shutter is used since it simplies the pixels. With this type of shutter the pixels integration times are not simultaneous for all pixels. Instead, the integration starts are offset in time, with a time equal to the row read time, between consecutive rows.
5
Chapter 2. CMOS Image Sensor Circuitry
Pixel array
Column Bus Row Select
Row select
Readout amps Ctrl Clk
Control
Parallel ADC Column select
Digital out
Fig. 2.1: A CMOS image sensor with on-chip ADC and control circuitry.
2.1 Photodetectors
The photodetectors collect the electrons, or holes, that are set free by the incoming photons. In a CMOS process there exist three basic types of photodetectors: photodiodes, photogates and phototransistors. A photodiode is a reverse biased pn-junction. The electric eld over the junction separates the electron-hole pairs, and a photo current proportional to the light intensity ows through the junction. This photo current can be integrated over the built in junction capacitance and the potential change over the junction will then be proportional to the collected light (assuming the capacitance is voltage independent which, however, is not entirely true). In an n-well CMOS process there are three different pn-junctions that can be used: n-diffusion/psubstrate, n-well/p-substrate and p-diffusion/n-well, see Fig. 2.2(a-c). Earlier the n-diffusion/p-substrate diode was commonly used, but the n-well/p-substrate diode is now also rather common. A parasitic vertical bipolar transistor can be created in an n-well CMOS process according to Fig. 2.2(d). The reverse biased pn-junction works the same way as for a photodiode and creates a photo current. This photo current is then amplied by the bipolar transistor gain which is typically much greater than one. In [1] it is concluded that photodiodes are better suited for image sensors than phototransistors since they have lower temporal noise, lower dark current and lower FPN. The phototransistors have potentially higher quantum efciency (QE) due to the amplication, but at low light levels this QE decreases.
2.1 Photodetectors
out n+ n-well photogeneration region p-sub photogeneration region p+
out
n-well p-sub
a) an n-well/p-substrate diode
b) a p-diff/n-well diode
out n+ photogeneration region p-sub
emitter
p+
photogeneration region
floating n-well
p-sub
c) an n-diff/p-substrate diode
d) a vertical phototransistor
gate n+
tg n+ p+ n photogeneration region
tg n+
photogeneration region
p-sub
p-sub
e) a photogate
f) a buried photodiode
Fig. 2.2: Different types of photodetectors in a CMOS process.
The photogate uses a structure borrowed from CCDs. It has a polygate over the detector area and by setting the voltage of this gate so that a channel is introduced under it, the electrons are collected in the neighboring potential well, see Fig. 2.2(e). The poly layer over the photodetector in the photogate structure blocks some of the light, especially if the process uses silicide/salicide. This blocking gives the photogates lower QE, especially in the blue area of the spectrum, when compared to photodiodes. Furthermore, the extra transistors and control signals reduce the ll factor (FF). A processing difculty is to realize a complete charge transfer. However, when this is done true correlated double sampling (CDS) can be implemented and the temporal reset noise is, thereby, canceled. This is the great advantage of the photogate structure. Fig. 2.2(f) shows a buried photodiode (also referred to as a pinned photodiode). This type of photodiode needs special processing steps to work at low voltage [2]. The buried diode works a bit like the photogate structure but without the need for the poly layer above the sensing area. As for the photogate a complete charge transfer allows for true CDS, which cancels the reset noise in the pixel.
2.2 Pixel and Readout Circuits

The most common way to use a photodiode in a pixel is in integrating mode, which means that the photocurrent is integrated over the built in diode capacitance for a certain amount of time. Another way is to use the photodiode in direct mode in, e.g., a logarithmic pixel. Below follows examples of the most common pixel types. There exists numerous other ways to handle the signals from the photodetectors, see [3] for some examples.
2.2.1 Passive Pixels

The simplest integrating pixel is the passive pixel, shown in Fig. 2.3. Sensors based on passive pixels are simply called passive pixel sensors (PPS). The passive pixels were used in, e.g., sensors from Linkpings universitet, IVP (e.g. MAPP2200), University of Edinburgh, and VVL during the late 1980s and early 1990s. The passive pixel uses an NMOS pass transistor to connect the photodiode to the column bus. A resetable charge amplier located at the bottom of the
row select c1 photodiode column bus s2
Vref2
s1
s3
Vref1
Fig. 2.3: Passive pixel and column readout amplier. column is then used to convert the charge over the diode capacitance to a voltage, see Fig. 2.3. When reading out the pixel the amplier is rst reset by turning on switches s1 and s2. Then the readout phase takes place and s1 and s2 are turned off while s3 and the transistor in the pixel are turned on. This result in a voltage change at the output of the amplier that is proportional to the amount of photons collected by the diode, assuming the capacitor c1 is linear. The major advantage with a passive pixel is its simplicity. With only the horizontal column select wire, the vertical column bus wire, and one transistor per pixel the pixels can be made very small and still achieve a high FF. Another advantage is analog row binning, which means it is possible to add the signals from pixels in the same column. This trades vertical resolution against the signal-to-noise ratio (SNR), and it is also used in some machine vision applications like, e.g., laser scatter measurements. The major drawback with the passive pixel is high temporal noise. Temporal noise at the column bus node during the reset phase is amplied by the ratio of the parasitic capacitance at the bus node over the integration capacitance, c1 in the readout amplier. This ratio increases linearly with the number of rows used and can, e.g., be around 100 for a sensor having 512 rows. Another drawback is column leakage, which makes the readout amplier sensitive to photons hitting pixels in the entire column during the readout phase. This column leakage is partly caused by photons being picked up at the column bus node diffusion of the pixel transistors. It is also caused by the capacitive coupling between the column bus node and the photodiodes. The column leakage is minimized by having a short readout time which, unfortunately, is power consuming. Another drawback is the complexity of the readout ampliers where mismatches can cause FPN
10
between columns. The passive pixel using only one transistor does not have any anti-blooming capability. Blooming occurs when a photodiode gets completely discharged and causes surrounding pixels to get more discharged than they should. In reality anti-blooming is often desired, and it is accomplished by adding an extra transistor with wires for the gate voltage and the source voltage. This extra transistor hinders the voltage over the photodiode to get below a certain limit and, thereby, prevents blooming. This anti-blooming yields a pixel with twice as many wires and transistors as the passive pixel in Fig. 2.3, and the major advantage of a small pixel with high FF is then reduced. All these drawbacks make the passive pixels rather unattractive and they are nowadays not used for 2-D sensors.
2.2.2 Active Pixels

Fig. 2.4 shows an integrating three transistor active pixel with a typical column readout circuit. Sensors based on active pixels are called active pixel sensors (APS). In an active pixel the charge to voltage conversion is performed within the pixel. This voltage is buffered by the transistor m2 which together with a bias current source (m3), located at the bottom of the column, form a source follower circuit. This means a voltage is driven onto the column bus. To reduce offsets between these source follower circuits a double sampling approach is used. After the signal has been read from the pixel, it is reset and the reset voltage is read. The difference between the reset value and the signal value then gives an offset free voltage that corresponds to the amount of detected photons. This is often referred to as correlated double sampling (CDS), but sometimes referred to as only double sampling (DS) since the operation does not cancel the reset noise from transistor m1. In an active pixel anti-blooming is performed by the reset transistor, m1 in Fig. 2.4. By letting the low level of the gate voltage of this reset transistor be much higher than the ground voltage, e.g., around the sum of the threshold voltage for m1 and m3 is suitable, the voltage over the photodiode is hindered to go all the way down to ground. The high level of the gate voltage of m1 can also be tuned. If it is set to Vdd the reset transistor will be in the subthreshold region at the end of reset and this is called soft reset. The voltage over the photodiode then only reaches within about one threshold voltage from Vdd. Furthermore, the reset level reached depends slightly on the voltage over the diode at the start of the reset
11
Vdd reset row select
m1 m2
photodiode
column bus
Vbias
m3
csignal
creset
Fig. 2.4: Active pixel and readout circuit. phase. This is a phenomenon called image lag. To overcome these issues the high level of the gate voltage of m1 can be pumped up higher. If it is set higher than Vdd plus the threshold voltage of m1 the voltage over the photodiode will become reset all the way to Vdd. This is referred to as hard reset. A drawback with this approach is that the temporal reset noise from m1 becomes around twice as high compared to the soft reset approach. It is possible to circumvent this increase in temporal noise and still reduce the image lag. A technique referred to as ushed reset has a separate wire routed to the pixel for the drain voltage of the reset transistor [4], [5]. This wire is dropped to a low voltage at the beginning of the reset phase and, thereby, causes a hard reset and removes the image lag. Then the voltage is increased back to Vdd and causes soft reset. Other techniques for low reset noise is hard-to-soft reset described in [4], and active reset described in [6]. Compared to PPS APS give lower temporal noise and shorter read time. Active pixels also feature the possibility to do multiple sampling, which is a technique for reaching high dynamic range by using several different integration times. This is possible since the readout of an active pixel is non-destructive, i.e., the pixel is not automatically reset when being read as is the case with the passive pixel. This non-destructive readout is, however, not entirely compatible with the double sampling described above. Since the built-in photodiode capacitance is slightly non-linear and the gain of the source follower varies slightly with the input voltage, the transfer function from detected photon to output voltage is not entirely linear. Integral non-linearity
12
Vdd row select
m1
photodiode
column bus
Fig. 2.5: Active logarithmic pixel. gures for a sensor using the three transistor active pixel is often around 0.5 to 1%. Variations in the pixel conversion capacitance and source follower gain cause conversion gain variations among the pixels in the sensor array. This causes gain FPN, which is undesired. For more information about PPS and APS see, e.g., [7].
2.2.3 Logarithmic Pixels

Fig. 2.5 shows a direct mode logarithmic pixel. The voltage over the diode is set by the characteristics of the load transistor, m1, and the photo current. This means light integration is not needed. Since the photo current is typically very small, the load transistor will end up in the subthreshold region and the response will, therefore, be logarithmic. The logarithmic response gives the circuit a very high dynamic range. However, drawbacks are high FPN and slow response time for low light levels.
2.2.4 Global Shutter

The rolling shutter used in the passive pixel and the three transistor active pixel causes distortion artifacts in fast moving scenes. To overcome this a global shutter, also referred to as synchronous shutter, can be implemented. The global shutter makes it possible to expose all pixels simultaneously, which is not the case with the rolling shutter. Fig. 2.6 shows a 4-transistor integrating active pixel with a global shutter, and Fig. 2.7 shows a 5-transistor integrating active pixel
13
Vdd snap reset n1
Vdd row select
photodiode
column bus
Fig. 2.6: 4-transistor shuttered pixel.

Vdd reset1 reset2 snap n1 photodiode Vdd row selsect
column bus
Fig. 2.7: 5-transistor shuttered pixel.
with global shutter. A benet with the 5-transistor solution is that it allows for simultaneous exposure and pixel read, which is important for high-speed imaging.
The storage capacitor c1 in Fig. 2.6 and Fig. 2.7 could be implemented as a MOS capacitor. However, often the parasitic capacitances at the storage node are sufcient.
With a global shutter it is very important that the shuttered node, node n1 in the gures, is not discharged by the incoming photons while the shutter is closed (i.e., when snap is low). A rst measure to combat this is to use the metal layers to shield the source/drain diffusions connected to this node. If the process uses a twin-well it may be possible to have the transistors placed in a p-well, while no well is used in the region of the photodiode leaving it in the lightly doped substrate [8]. The difference in doping concentrations then limits the diffusion of the electrons freed by the incoming photons.
14
2.3 ADC
In general there exist three architectures for on-chip ADCs for CMOS image sensors. They are one single ADC for all pixels, one ADC per column, and one ADC per pixel. Also some derivatives of these exist, like a few ADCs onchip instead of one, a column parallel ADC where a group of columns share one ADC, or one ADC per group of, e.g., four neighboring pixels. One single ADC is often used in low-speed sensors. The pixel level ADC, on the other hand, can handle very high speed, however, it drastically decreases the FF. The column parallel architecture can be used for both high-speed applications and low-power applications. For the single ADC many different types of ADC architectures can be used. For pixel level ADC a single slope approach is most often used, see e.g. [9]. The single slope approach is also often used for column parallel ADCs [10], however, e.g. successive approximation (SA) ADCs using switched capacitor charge sharing [8] and cyclic ADCs [11] are also used for column parallel ADCs. Compared to the cyclic and SA architectures the single slope ADC requires much less area and it is easier to reach good accuracy using the single slope. A drawback with the single slope is the speed since one conversion requires 2n clock cycles. It is, however, possible to trade accuracy or signal swing for higher speed, and the clock frequency can also be much higher than for the cyclic and SA ADCs. For more information about column parallel ADCs see Paper III.
References
[1] B. Fowler, CMOS area image sensor with pixel level A/D conversion, PhD Thesis, Stanford University, 1995. [2] I. Inoue, H. Tanaka, H. Yamashita, T. Yamaguchi, H. Ishiwata, and H. Ihara, Low-Leakage-Current and Low-Operating-Voltage Buried Photodiode for a CMOS Imager, IEEE Trans. Electron Devices, vol. 50, no. 1, pp. 43-47, Jan. 2000. [3] A. Moini, Vision chips or seeing silicon, Technical Report, Centre for High Performance Integrated Technologies and Systems, The University of Adelaide, Australia, Mar. 1997.
References
15
[4] B. Pain et al., Analysis and enhancement of low-light-level performance of photodiode-type CMOS active pixel imagers operated with sub-threshold reset, in 1999 IEEE Workshop on Charge-Coupled Devices and Advanced Image Sensors, Nagano, Japan, pp. 140-143, June 1999. [5] H. Tian, B. Fowler, and A. El Gamal, Analysis of Temporal Noise in CMOS Photodiode Active Pixel Sensor, IEEE J. Solid-State Circuits, vol. 36, no. 1, pp. 92-101, Jan. 2001. [6] B. Fowler, M. D. Godfrey, J. Balicki, and J. Caneld, Low Noise Readout using Active Reset for CMOS APS, Proc. SPIE, vol. 3965, pp. 126-135, San Jose, CA, 2000. [7] A. El Gamal, Lecture notes in the course EE392 at Stanford University, 1998. [8] A. I. Krymski and N. Tu, A 9-V/Lux-s 5000-Frames/s 512 512 CMOS Sensor, IEEE Trans. Electron Devices, vol. 50, no. 1, Jan. 2003. [9] S. Kleinfelder, S. Lim, X. Liu, and A. El Gamal, A 10kframes/s 0.18m CMOS Digital Pixel Sensor with Pixel-Level Memory, ISSCC Dig. Tech. Papers, pp. 88-89, 2001. [10] K. Chen, M. Afghahi, P-E. Danielsson, and C. Svensson, PASIC: A Processor-A/D converter-Sensor Integrated Circuit, Proc. IEEE Int. Symp. Circuits and Systems (ISCAS90), vol. 3, pp. 1705-1708, May 1990. [11] K. Chen, and C. Svensson, A Parallel A/D Converter Array Structure with Common Reference Processing Unit, IEEE Trans. Circuits Syst., vol. 36, no. 8, pp. 1116-1119, Aug. 1989.
Paper I
A Multiresolution 100-GOPS 4-Gpixels/s Programmable Smart Vision Sensor for Multisense Imaging
L. Lindgren, J. Melander, R. Johansson, and B. Mller IEEE J. Solid-State Circuits, vol. 40, no. 6, pp. 1350-1359, June 2005.
17
Leif Lindgren Johan Melander, Member, IEEE Bjrn Mller Robert Johansson
Abstract This paper presents a multiresolution general-purpose high-speed machine vision sensor with on-chip image processing capabilities. The sensor comprises an innovative multiresolution sensing area, 1536 A/D converters, and a SIMD array of 1536 bit-serial processors with corresponding memory. The sensing area consists of an area part with 1536512 pixels, and a line-scan part with a set of rows with 3072 pixels each. The SIMD processor array can deliver more than 100 GOPS sustained and the on-chip pixel-analysing rate can be as high as 4 Gpixels/s. The sensor is ideal for high-speed multisense imaging where, e.g., colour, greyscale, internal material light scatter, and 3-D proles are captured simultaneously. When running only 3-D laser triangulation, a data rate of more than 20,000 proles/s can be achieved when delivering 1536 range values per prole with 8 bits of range resolution. Experimental results showing very good image characteristics and a good digital to analogue noise isolation are presented. Index Terms APS, CMOS image sensors, laser triangulation, machine vision, MAPP, multiresolution, multisense, smart vision sensors, 3-D.
L. Lindgren was with IVP Integrated Vision Products AB, and is now with Synective Labs AB, 583 30 Linkping, Sweden (e-mail:lei@isy.liu.se). J. Melander is with SICK IVP AB, 583 35 Linkping, Sweden (e-mail: jme@sickivp.se). R. Johansson was with IVP Integrated Vision Products AB, and is now with Micron Imaging, 0349 Oslo, Norway (e-mail: rjohansson@micron.com). B. Mller was with IVP Integrated Vision Products AB, and is now with Metrima AB, 581 10 Linkping, Sweden (e-mail: fullcustom.designer@home.se).
18
19
1 Introduction
Today two main alternatives exist for general image sensing, charge-coupled devices (CCD) and CMOS image sensors. CCDs still offer the best image performance in high-end systems such as space and medical imaging. However, in contrast to CCDs CMOS image sensors offer the possibility of system integration on one chip [14] giving advantages such as high speed, compact system, low cost, and low power. Machine vision spans a wide range of applications where one of the biggest and fastest growing segments is real-time control in factory automation. Economical aspects today are pushing this segment towards 100% in-line inspection without slowing down production. A clear trend in machine vision is the move towards augmenting the normal 2-D greyscale inspection with 3-D measurements. Many critical inspection tasks require control of the third dimension and the trend is perhaps clearest in electronics, e.g., solder paste volume, and wood inspection, e.g., wane detection (wane is missing wood due to the curved log exterior and knowledge of wane position and defects is essential for cut selection). A generalpurpose machine vision camera system in this segment can be characterised as a high-speed 2-D/3-D measurement system. As a consequence of high-speed, thus short integration times, high sensitivity is also required. Furthermore, a compact system is important in order to get in close enough range to the objects or just to be able to mount the system on moving parts. Image quality requirements are in most cases mid-range. With the potential of system integration, mid-range image quality, and sensitivity comparable to CCDs a CMOS image sensor seems to be the ideal candidate for machine vision. The sensor presented is a further development and extension of the previous sensors LAPP [1], PASIC [2], MAPP2200 [3], and MAPP2500 [5]. It is a generalpurpose high-speed smart machine vision sensor, fabricated in a standard 0.35 m triple metal double poly CMOS process. The sensor integrates on one chip an innovative multiresolution sensing area, containing one area part and one highresolution line-scan part (HiRes), a column parallel A/D conversion, and a column parallel single-instruction multiple-data (SIMD) processor. The processor implements image oriented instructions and delivers more than 100 giga operations per second (GOPS) sustained. The processor architecture can be used for a variety of image processing tasks such as ltering, template matching, edge detection, run-length coding, and line-scan shading correction. Although being
20
Paper I
Camera with newly developed sensor
3-D triangulation illumination
Scatter illumination Colour and greyscale illumination
Object movement
HiRes rows field-of-view Object Camera field-of-view
Fig. 1: A typical multisense system. general-purpose the main application eld for this sensor is high-speed highresolution multisense imaging where, e.g., colour, greyscale, internal material light scatter, and 3-D proles are captured simultaneously. This is enabled by the ability to dedicate different rows of the sensor to different measurement tasks, the high-speed pixel readout, and the on-chip SIMD processor. The multiresolution concept and the high level of integration enable low-cost high-performance machine vision systems. Section 2 exemplies a multisense system. Section 3 describes the architecture of the sensor. The actual circuit implementation is described in Section 4 and experimental results are provided in Section 5. Section 6 compares the presented sensor with other advanced sensors. Finally, Section 7 concludes the paper.
2 Multisense Imaging
A typical multisense system utilising the presented sensor is depicted in Fig. 1. A horizontally moving object is inspected and 3-D shape, internal material light scatter, colour, and greyscale are extracted in a single pass using the same camera system. One part of the area sensor is dedicated to 3-D shape measurements. The 3D extraction is achieved using the well-known sheet-of-light technique where
21
prole data are acquired by triangulation [6]. A laser line is projected on the target and the offset position of the reected light on the sensor carries the 3-D data for one prole. The complete 3-D shape of the object is built up from consecutive proles of the moving target. On-chip hardware implements all data processing necessary for high-quality proles to be sent out from the sensor. Running 3-D proles only, a data rate of more than 20,000 proles/s can be achieved with this sensor delivering 1536 range values per prole with 8 bits of range resolution. This translates to the extremely high pixel analysing rate of 4 Gpixels/s. When going down to 7 bits of range resolution the data rate increases to more than 40,000 proles/s producing 61 M range values per second.
A second part of the area sensor is devoted to internal material light scatter measurements [7]. When light strikes a surface it will be scattered within the material and give a bright region around the point of inuence. The amount of scattering depends on the optical density of the material; low density materials will scatter more than high density materials. This effect is very useful in inspection of vegetables and in wood inspection where, e.g., a knot will scatter much less light than the surrounding clear wood due to its higher density and different bre orientation. A laser line illumination source is mounted perpendicular to the object and from this light a reected and a scattered component are extracted. As for the case of 3-D proles, hardware support is implemented for effective laser scatter line composition.
A third part of the area sensor is coated with colour lters, typically red, green, and blue lters, and provides colour information. A white light source is focused on this area and 8-bit data per colour channel are sent out from the chip. The HiRes rows constitute a fourth sensor part. A white light source, preferably the same as for the colour section, is focused on the HiRes rows and 8-bit highresolution greyscale line-scan data are sent out from the sensor. The sensor can, e.g., deliver 1536 8-bit range data, 1536 28-bit scatter data, 1536 8-bit colour data per channel, and 3072 8-bit greyscale data 7900 times per second. This translates to a data rate of 777 Mbit/s.
22
Paper I
Column HiRes rows (3072 pixels each)
AS
RW E B
24
3 Architecture
3.1 System Level
Fig. 2 depicts the high-level chip block diagram. The core of the chip consists, with a few exceptions, of a linear array of 1536 columns butted together in the x-direction. Each column contains a part of the multiresolution sensor area, analogue readout, A/D conversion/thresholding, processor, and registers. The instruction decoding and digital control, analogue switch control (AS), and pixel array row addressing are placed on the left side of the core. The dataport logic (DL), A/D conversion logic (AL), and the DAC used for A/D conversion are placed on the right side. The chip has a simple interface where instructions are sent on a 24-bit bus, B, synchronised with the main chip clock, E, running at 33 MHz. The chip interprets and executes instructions within a single clock cycle, in most cases. Instructions sent to the core array constitute a typical example of a single-instruction multipledata (SIMD) processor architecture. Thus, a single instruction is distributed to all columns and each column executes the instruction using its respective column data. The instruction bus, B, can be turned around by changing the read-write signal, RW, and status information can then be read instead. A fraction of the registers in the core array are augmented with a dataport capability. This gives
Row decoder
Digital ctrl
1536 x 512 Array

DAC
1536 Columns Analogue readout and AD-conversion Processor and registers
AL DL
S DP
32
Fig. 2: Chip block diagram.
23
the possibility to stream out data on a 32-bit wide bus, DP, controlled by a separate clock, S, at 1.1 Gbit/s. The parallelism is further increased by using a third separate clock, A, that controls the progress of the A/D conversion. The parallelism implemented in this design by the use of three separate clock domains, E, S, and A, allows for highly parallel high-speed algorithm implementations.
3.2 Multiresolution Sensor Area and Analogue Readout

The analogue part of one column is shown in the left part of Fig. 3. The multiresolution sensor area is made up of two different parts. The area part has 1536 columns and 512 rows with 9.5 m square pixels and the HiRes part has 3072 columns and a set of rows with a high and narrow pixel. The pixel array has 1536 vertical column lines for reading out the pixel voltages. To connect the HiRes rows to the vertical lines each HiRes row has two row addresses, making it possible to rst read out the even columns and then the odd columns, Fig. 3. Each of the vertical lines in the pixel array connects to a correlated double sampling (CDS) and digitally programmable gain amplier (DPGA) circuit. This circuit performs CDS, with a programmable gain of four settings, and can also perform analogue row binning. The sensor area (including HiRes) is fully addressable via the row address decoder and the column selectability of the dataport. This enables multiple windowing and region of interest readout. Furthermore, the row address decoder allows for different integration times for different rows.
3.3 A/D Conversion

The A/D conversion is column parallel and of single slope converter type [2]. This is implemented with a comparator and 8-bit memory in each column, and a common counter that feeds a DAC and the 8-bit in-column memory. The counter counts from 0 to 255, making the DAC produce a voltage ramp. When the ramp exceeds the voltage from the DPGA/CDS circuit the comparator switches and the counter value is loaded into the column memory (ADREG in Fig. 3). The ADC resolution is programmable from 3 to 8 bits, with a fast pseudo conversion (FPC) option. When using FPC the counter step-size is doubled after reaching the value 64, thereby decreasing the greyscale resolution of highly illuminated
24
Paper I
PD ADREG 8 8b counter HiRes GLU
512 rows
column line
NLU
ALU line
PLU ST Acc 16
CDS DPGA
vramp COUNT/GOR comp 96 RREG PD SREG/dataport 16
Fig. 3: The column architecture. parts of the image. The A/D conversion is clocked at 33 MHz resulting in 7.7 s per 8-bit conversion and 4.8 s per 8-bit FPC conversion. An advantage with this ADC topology is the ability to perform fast thresholding, a crucial operation in many image processing algorithms. Besides making the DAC produce a ramp, it is possible to load a digital value into the DAC and thereby produce a voltage for the threshold operation. The DAC features a 256-step programmable gain and a 256-step programmable offset, making it possible to do an automatic calibration of the A/D conversion. If the calibration is performed under a well-dened illumination it is possible to make different sensors have the same photo response. The programmable gain and offset can also be used together with the on-chip digital processor to implement an innovative dithering scheme making it possible to do A/D conversions with more than 8 bits of resolution. A 9-bit conversion is realised by adding the result from two 8-bit conversions where one of the conversions was made with the ADC offset changed by 1/2 LSB. In a similar manner 9.5-bit and 10-bit conversions can be obtained. Since the ADC offset step size is independent of the
25
ADC gain setting the gain setting has to be set so the step size equals 1/2 LSB, or 1/3 LSB in the 9.5-bit case and 1/4 LSB in the 10-bit case, to get a low DNL. The dithering scheme is compatible with the FPC option. For instance, a 9-bit conversion takes 16 s, a 9.5-bit conversion takes 24 s , a 10-bit conversion takes 32 s, and a 10-bit pseudo conversion takes 20 s.
3.4 Processor and Registers

The processor and registers constitute a high-performance image processing unit by the use of massive parallelism and the implementation of image oriented instructions. The processor in each column is bit-serial allowing for extremely high-speed binary image processing. The bit-serial approach is also exible whereas it allows variable word lengths and data formats to be used for greyscale image processing. For example, for each row, a 33 Gauss ltering takes 8b cycles and a 33 Sobel ltering takes 11b cycles, where b is the number of bits used. Run-length coding takes 10 clock cycles per object per row, which, e.g., makes it possible to capture binary images and output run-length coding at more than 600 full frames per second, when allowing ten objects per row. Furthermore, a multiplication takes 3b2 cycles and can for instance be used for shading correction in a line-scan application. A column in the digital part of the sensor is illustrated in the right part of Fig. 3 and it consists of the following parts. PD is a 1-bit register that holds the thresholded value of the column and ADREG is an 8-bit register that holds the A/D converted column value. Below that is the processor part consisting of a global logical unit (GLU ), neighbourhood logical unit (NLU ), and a point logical unit (PLU ) with status registers and 16 accumulators. Further below are the global feature extractions COUNT and GOR. Finally at the bottom the general registers, consisting of 96 bits per column (RREG) and 16 bits per column (SREG), where SREG is also part of the high-speed dataport. Each column carries a 1-bit ALU line that is the intra-column communication channel between the different blocks in the digital part. The simplest type of instructions are the PLU -instructions that perform Boolean operations on the column data. The increased number of bits in the accumulator, 16 bits compared to 1 bit in earlier sensors, together with new instructions for addition and subtraction increase the speed of greyscale image processing. Typical
26
Paper I
for image processing is the fact that the result for a pixel depends on its neighbours, such as median ltering or template matching. The NLU was designed for this purpose allowing a single instruction for a three-column median ltering or template match. Note that the NLU and PLU are tightly integrated indicating that a PLU -instruction can only be performed by passing data through the NLU. The result of an NLU/PLU -instruction is stored in one of the 16 accumulators. Since all NLU/PLU -instructions can be performed in a single clock cycle the arithmetic performance is very high exceeding 100 GOPS. The GLU was added to circumvent the problems associated with global instructions and SIMD architectures. The GLU provides a set of instructions where each column result depends on all columns input data, see [3]. The global feature extraction units, COUNT and GOR, operate on accumulator 0. The COUNT feature outputs a digital value equal to the total number of ones in accumulator 0 in all columns. The GOR feature is a single-bit result operation performing a global-OR on accumulator 0 in all columns. The results from GOR and COUNT are available at the B bus together with other status information. In addition, the COUNT -value can be read from the dataport and the GOR result is available on an external pad.
4 Circuit Implementation
A chip photograph is shown in Fig. 4. The chip measures 16.8 mm 11.2 mm and comprises 5.8 M transistors. It is implemented in a 0.35 m 3.3 V/5 V triple metal double poly standard CMOS process. Both the analogue and the digital parts of the sensor are powered with 3.3 V. The large chip dimensions, mixed-signal nature, and the use of relatively high-speed synchronous control present many design challenges. Special attention has for instance been focused on mixed-signal noise isolation, large distance signalling, synchronisation, and acceleration techniques for parallel feature extraction logic.
4.1 Mixed-Signal Aspects

In this design the use of EPI-wafers, consisting of a thin lightly doped p epitaxial layer on top of a heavily doped bulk, is required from an image sensing point
27
Fig. 4: Chip photo. of view. EPI-wafers offer a high quality substrate for the photodiodes, which yields lower and more uniform dark currents. Furthermore, EPI-wafers relax the need for substrate contacts in each pixel, thus increasing the ll factor (FF). EPIwafers also offer the potential for good digital-analogue noise isolation if a low ohmic die back-side connection to ground can be made combined with the use of separate power domains [8], a technique used in this design. Although having only a small impact on noise isolation for EPI-wafers, traditional techniques such as the use of guard rings, separating digital and analogue parts by distance, etc. have also been used. On-chip decoupling has been used extensively for the power supplies. There is 20 nF of gate decoupling capacitance in the analogue domain and 17 nF in the digital domain.
4.2 Large Distance Signalling

A multitude of horizontal control signals are used in this design, each measuring about 15 mm and connecting to an input in each column. The clocking approach in this design uses two non-overlapping clocks and their inverses, hence it is critical to synchronise and preserve the non-overlap inside a column and
28
Paper I
between columns. The minimal non-overlap that can be used is determined by the RC-time constant spread of all control signals. Initially a variation in control signal load capacitance between 12 pF and 100 pF prevented a fast design, i.e., small non-overlap. This was relaxed by the innovation of making every four columns share a local inverter and instead distribute the inverse of the control signal. This decreased the variation in capacitive load to 512 pF, and significantly reduced the rise and fall time of the clock signal seen in each column. A standard sized control signal driver was then used and the RC-time constant was equalised among horizontal control signals by sizing the wire width. In this design this resulted in an RC-time constant of 1 ns and wire widths of 25 m. For analogue control signals (pixel array, readout, and ADC) local inverters are not used. Non-overlap is instead guaranteed by delaying ON-signals half a clock cycle compared to OFF-signals.
4.3 GLU, COUNT, and GOR

Intricate and innovative acceleration structures have been used for the GLU, GOR and COUNT. Simulations at 85 C using worst speed transistor models and distributed RC-extraction netlists show that COUNT needs 35 ns to evaluate for the worst input condition. This means that one clock cycle is required after an instruction that alters accumulator 0 before a valid COUNT -value can be read from the sensor. Similar simulations of the GLU show that the result is not ready within 15 ns in the worst case. Due to instruction timing issues this also means that it will take one extra clock cycle before the GLU -result can be read. Often this does not mean that an algorithm is slowed down since one instruction, not requiring the GLU/COUNT -result and not affecting accumulator 0, can be issued before using the result. Simulations of GOR showed 6.1 ns for the worst condition, which is when only accumulator 0 in the rightmost column is set.
4.4 Pixels and Analogue Readout

The pixels are standard three transistor active pixels. The ll factor, being the pixel-area not covered by metal or n+ -diffusion, is calculated to 60% for the array pixels and 80% for the HiRes pixels. The voltage levels at the gate of the reset transistor are controlled from the outside of the chip making both soft reset
29
and hard reset possible [9]. Hard reset has been used due to the demand of high speed and no image lag. The DPGA/CDS circuit is shown in Fig. 5 along with the comparator, one column line, and one pixel. The circuit contains two independent readout paths, one using an SC-amplier and the other using a single capacitor. The SC-amplier is an inverting voltage amplier with three gain settings, -1, -3, and -4. The gain is set by the g0 and g1 switches. The operational transconductance amplier (OTA) is of folded cascode type and its offset is removed by offset compensating at the comparator (sr5). When using the single capacitor readout (SCR) g0, g1 and si2 are turned off and ng is turned on. The right side of the capacitor C3 is charged either by vrampp, which is the positive output from the DAC, or by the external reference signal vdac0, when the signal is read from the pixel. In the next phase the pixel is reset and the right side of the C3 capacitor is oating making it follow the voltage change on the column line. This results in a voltage change on the negative input of the comparator that corresponds to the difference between the signal value and the reset value from the pixel, resulting in CDS being performed (this is sometimes referred to as only double sampling (DS) or double data sampling (DDS) since the temporal noise of the reset transistor is not cancelled). With the SCR it is also possible to do true CDS, meaning the reset voltage is rst read, then the pixels integrate light, and then the signal voltage is read. This cancels the reset noise in the pixel and can for instance be used in an application requiring extremely low temporal noise for greyscale line-scanning. No limitations on the integration time exist for this readout mode, except for limitations set by dark current in the pixels and the fact that this readout mode slows down operation since no other rows can be accessed during this integration. By operating g0 and g1 in a special way it is possible to readout a pixel value with gain -1 and use that value for thresholding or A/D conversion. The gain -1 value can then be amplied with a factor of four with the pixel voltage still at reset level. The gain -4 value can then be used for thresholding or A/D conversion. This opens the possibility of high dynamic range algorithms.
4.5 A/D Conversion

To get a monotonous and glitch free ramp with very low differential non-linearity (DNL) and integral non-linearity (INL) the 8-bit DAC is of thermometer decoded
30
Paper I
rst sel
column line C2 g0 g1 C1 vdac255 ng si3 si2

OTA
si1 sr2
vdac0
C0
sr4 sr5 sr3 C3 si4 1 vrampn 0

comp
s1
vrampp
Fig. 5: DPGA/CDS circuit along with the comparator, column line, and a pixel. current steering type [10]. The DAC produces two outputs where one ramps down, vrampn, whereas the other ramps up, vrampp. According to Fig. 5, either of the two ramps can be connected to the positive input of the comparator. The reason for this is that vrampp is used together with the SCR path whereas vrampn is used with the SC-amplier path. The DAC has a programmable offset for each of the two outputs and the offsets are individually set with 8-bit resolution. Within the DAC there is also an 8-bit current steering DAC that produces a voltage that controls the amount of current supplied by the current sources producing the two ramps. This makes it possible to change the swing of the ramps in 256 steps, from 0.55 V to 1.6 V. The gain setting does not affect the offset current sources. It is important that the signal from the DAC is stable over temperature, therefore, a circuit based on an on-chip bandgap voltage reference has been used to achieve this. To get a low column to column x pattern noise (FPN) it is important to have comparators with very low input offset. It is desirable that the comparators draw a constant current so they do not produce switching noise on the power supply, and that they have a high PSRR. Furthermore, the comparators should be insensitive to charge injection and clock feed through (CI/CFT) from switches turning off so that there will be no low frequency FPN in the form of a predictable column gradient due to the fact that CI/CFT are dependent on the fall time of the gate signal [11]. To meet these requirements a four stage comparator is used. The two rst stages are fully differential, the third stage is an innovative constant current differential to single ended converter, and the fourth stage is a conventional inverter that makes the output rail-to-rail. Offset compensation is realised
31
Table 1: Characteristics for the DAC, the ADC, and the Array pixels
Parameter Global DAC DNL, (1 V) Global DAC INL, (1 V) ADC DNL, (1 V) ADC INL, (1 V) Photo response INL FF (array pixels) QE Pixel capacitance Dynamic range Dark signal Conversion gain Temporal noise (RMS, in dark) Temporal noise (RMS, in dark, pixel reset on) FPN (RMS, dark, entire array) Gain FPN (Global - entire array) Gain FPN (Local - 99 pixels) Image lag Value RMS=0.04, max=0.12 LSB RMS=0.04, max=0.13 LSB RMS=0.06, max=0.50 LSB RMS=0.08, max=0.45 LSB RMS=0.09%, max=0.16% 60% 45% at 610 nm 7.5 fF 62 dB, (65 dB reset on) 0.35 LSB/ms at 60 C 14.5, 62.0 V/e , (DPGA gain: G=1, G=4) 900 V, 3.99 mV, (G=1, G=4) 700 V, 2.76 mV, (G=1, G=4) 0.15, 0.22 LSB, (G=1, G=4) 1% RMS 0.58% RMS Below measurement limit
by output offset storage (OOS) for the rst stage and input offset storage (IOS) for the second stage [12]. The rst three stages are connected to an analogue power supply and they draw almost constant current even when switching. The inverter is driven by a digital power supply in order not to produce noise on the analogue power domain.
5 Experimental Results
For evaluation purposes the sensor was mounted chip-on-board on an evaluation board. An integrating sphere and a stabilised DC light provide uniform illumination. Many of the test results from the A/D conversion and the array pixels are listed in Table 1. For most measurements the DAC swing was set to 1 V, and 8-bit A/D conversions were performed.
32
Paper I
It was possible to characterise the global DAC and the column ADC separately due to the possibility to multiplex the DAC output out to an external ADC, and the possibility to feed an analogue value from an external DAC to the input of the internal ADC. The RMS INL of the DAC and the ADC was calculated as the RMS value of the INL vector obtained when tting a curve using least square approximation, while the max INL was calculated as the maximum value of the INL vector obtained when tting a curve by minimising the maximum deviation. Fig. 6 shows the measured DNL vector and the minimise max INL vector for the DAC, and extracted values are found in Table 1. The results are very good and the 0.13 LSB max INL gives a relative accuracy of 10.9 effective number of bits (ENOB) for the DAC. The ADC INL and DNL was measured individually for all 1536 columns, and the results for all columns were very similar. Fig. 7 shows the measured DNL vector and the minimise max INL vector for the ADC in column 0 for a standard 8-bit conversion. The much shorter conference paper [13] showed higher values for the ADC INL than the ones presented in Table 1. This because INL was then calculated using data from the entire DAC sweep instead of the found DNL points. This makes the INL RMS of an ideal ADC equal to the quantisation noise and INL max equal to 1/2 (noticeable is that the INL RMS of 0.31 LSB presented in [13] is very close to the theoretical quantisation noise of 1/ 12 0.29 LSB). The ADC DNL measurements showed that the very rst few steps of the A/D conversion were smaller than the rest of the steps, see Fig. 7. It is believed that this is due to the delay of the DAC and the comparators becoming settled after a few DAC steps. The smaller steps cause both the max DNL and max INL of the ADC to be close to 0.5 LSB which is considerably higher than for the DAC. In reality the smaller steps actually gives higher resolution for the rst grey levels. However, if this is not desirable it is possible to use the other of the two ramps for A/D conversion and then invert all the bits in the digital output from the conversion. This will make the smaller steps show up at the other end of the range, i.e., around grey level 255. Another way is to use the programmable offset to move the ramp so the signal will always be higher than the rst few grey levels. If this later method is used the max DNL and max INL of the used ADC range becomes close to the non-linearity values for the DAC. Both these methods have been tested and work as expected. The photo-response integral non-linearity for the entire signal chain, i.e., from photons to digital value, was measured under uniform illumination using 128 equally spaced integration times. At each integration time 20 images were captured and the average pixel value calculated. As for the DAC and ADC, the INL
33
DNL (LSB)
0.1
0.1
50
100
150
200
250
0.1 INL (LSB) 0 0.1 0 50 100 150 200 DAC input code (LSB) 250
Fig. 6: Measured DNL and INL for the internal DAC.
0.2 DNL (LSB) 0 0.2 0.4 0 0.5 INL (LSB) 50 100 150 200 250
0.5
50
100 150 200 ADC output (LSB)
250
Fig. 7: Measured DNL and INL for the internal ADC.
34
Paper I
2.5 Global 2
FPN (LSB)
1.5 Local 1
0.5
50
100 150 Grey Level (LSB)
200
250
Fig. 8: Measured global and local total FPN vs. grey level. was calculated by tting a curve both using least square approximation and by minimising the maximum deviation. The results were extremely good with an INL of 0.09% RMS and 0.16% max. The FPN in dark and gain FPN was measured using an ADC swing of 1 V. FPN was measured as the RMS error, i.e., standard deviation, among the pixels in temporally averaged images captured in room temperature, i.e., it also contains dark current FPN. For the dark FPN the integration time was 400 s. The results are given in Table 1. Fig. 8 shows global and local (99 neighbourhood) total FPN at different grey levels. There are two possible ways to read out the reset value from a pixel. Either while the reset transistor in the pixel is still turned on, or after turning the reset transistor off. Having the reset transistor on results in lower temporal noise since the noise from the reset transistor then becomes low pass ltered via the pixel source follower readout, but it also results in higher FPN due to charge injection and clock feed-through from the reset transistor. For accurate measurements of the ADC input referred temporal noise the method presented in [14] was used. Fig. 9 shows the measured ADC input referred temporal noise as function of mean ADC input signal with reset on. Calculating the theoretical RMS reset noise
35
4 3.5 Temporal RMS noise (mV) 3 2.5 2 1.5 1 0.5
0.2 0.4 0.6 0.8 Mean ADC input signal (V)
Fig. 9: Measured temporal noise referred to the ADC input vs. ADC input mean (reset on). in the pixel [9] and adding the simulated temporal noise from the pixel source follower and the comparator result in 810 V for reset off. This is slightly lower than the 900 V measured. Another observation concluded from measurements is that heavy digital activity, including fast I/O, only affects the temporal noise marginally. An increase of 00.03 LSB can be measured depending on readout mode. No effect on FPN was measured. This veries that noise isolation in the sensor is very good. The conversion gain, from collected electrons to ADC input, was calculated by applying Poisson statistics on the measured temporal noise in Fig. 9, [15]. From the conversion gain the pixel capacitance was calculated to 7.5 fF. The absolute spectral response was measured for the pixel array using a focused light source with variable wavelength, controllable in 5 nm steps, see Fig. 10. Peak QEFF is 27% and occurs at 610 nm; using a FF of 60% a peak QE of 45% is obtained. The power consumed for a typical high-speed application is 680 mW. This is distributed as 24% in the digital part (including I/O), 20% in the global DAC, and the remaining 56% in the analogue part. It should be noted that the analogue part has also been designed to work at half of the nominal bias current, set by external
36
Paper I
0.45 0.4 0.35 QE (electrons/photon) 0.3 0.25 0.2 0.15 0.1 0.05 0 300 400 500 600 700 800 Light wavelength (nm) 900 1000
Fig. 10: Measured quantum efciency vs. light wavelength. resistors, which effectively halves the analogue power, thus trading lower speed for lower power. To measure the dark current as a function of temperature an on-chip temperature sensor has been used for junction temperature measurements. The dark current results in an average pixel leakage of 0.35 LSB/ms at 60 C and it doubles for every 10 C increase in temperature. To validate the increased horizontal resolution of the HiRes rows compared to the area pixels the modulation transfer function (MTF) was measured [16]. The result of the MTF measurement was 0.4 at the Nyquist frequency for the HiRes rows, and 0.45 at the Nyquist frequency for the area pixels. Since the Nyquist frequency is twice as high for the HiRes rows compared to the area pixels, it can be concluded that the horizontal resolution of the HiRes rows is almost twice as high. The HiRes pixels behave similarly to the array pixels. Dark current was however 60% higher for the HiRes pixels, compared to the normal array pixels. This is attributed to the larger perimeter and area of the HiRes photo diodes [17]. Temporal noise was slightly lower due to the larger diode capacitance in the HiRes pixels.
37
Fig. 11: Scatter component (top) and direct component (bottom) of a scan of a piece of wood.
Besides the extensive image characterisation all digital operations have been tested and work as expected.
Fig. 11 shows the scatter component and direct component of a scan of a piece of wood. The knots become dark in the scatter image because the laser light does not move along the natural bres in the wood when a knot is scanned. An example of a 3-D scan of a BGA is shown in Fig. 12. The image shown has been rotated, median ltered, zoomed in (it shows 100100 pixels while each 3-D prole is originally 1536 range values wide), and merged with the greyscale image to better show details. The axes denote pixels and one pixel here corresponds to 10 m.
38
Paper I
Fig. 12: Zoomed in BGA 3-D scan.
6 Comparison With Other Sensors

Compared to the 512512 pixels sensor MAPP2500, see [5], the area part of the presented sensor has three times higher horizontal resolution and the HiRes rows has six times higher resolution. The pixel readout speed is much higher, 4 Gpixels/s compared to 288 Mpixels/s and an A/D conversion is twice as fast. Furthermore, the noise is much lower due to the use of passive pixels in MAPP2500. The digital part in the presented sensor is clocked more than four times faster than the digital part in MAPP2500, which ran with an 8 MHz clock. With three times as many columns and with the increased functionality of the processor core, meaning that many operations require fewer clock cycles, the processing capability is 12 to 24 times higher in the presented sensor. Due to the unique architecture a comparison with other sensors is difcult since the only sensors, known to the authors, offering similar exibility is the previous
39
generation sensors, i.e., MAPP2200 and MAPP2500. However, Table 2 compares the presented sensor with a high-speed sensor without processing, a lowresolution sensor with 4128 bit-serial processors, a 640480 3-D and greyscale sensor, and a 375365 3-D and greyscale sensor. Compared to state-of-the-art high-speed CMOS image sensors without processing, but with ADC on-chip, the presented sensor has matching image performance. In [18] a sensor with 23521728 pixels was presented. This 4.1 Mpixel sensor features a 7 m three transistor active pixel. It has higher QEFF compared to the sensor presented in this paper, which may be due to the use of a specialised CMOS sensor process with improved collection. When using soft reset this sensor gives slightly lower noise in darkness, 65 e temporal and FPN, than the presented sensor, 74 e , but when using hard reset the noise was considerably higher for the 4.1 Mpixel sensor, 103 e . The 4.1 Mpixel sensor uses a 10-bit ADC and can run at 240 frames per second. This gives a row time of 2.41 s. This is more than six times longer than the shortest row time possible with the presented sensor, which is 0.38 s for thresholded images. It is, however, about as fast as when a 7-bit FPC conversion is used in the presented sensor, which takes 2.73 s, and about twice as fast as an 8-bit FPC conversion, which takes 5.16 s. The 4.1 Mpixel sensor has a conversion gain of 39 V/e which is comparable to using the gain -3 setting on the presented sensor. With this conversion gain the use of an 8-bit conversion is motivated since the quantisation noise is then lower than the noise in darkness of both sensors and does only add marginally to the total noise. In [19] a 128128 sensor with 4128 bit-serial processors was presented. Compared to the presented sensor it lacks circuitry for row-global operations and feature extraction provided by the GLU, COUNT, and GOR. The sensor is targeted at video-rate applications, e.g., colour processing tasks at 30 frames/s is mentioned and each ADC is shared among four neighbouring columns slowing down high-speed operations. However, it seems to be possible to use the sensor for 3-D laser triangulation at moderate speed. If the row addressing is exible it is then also possible to do multisensing where, e.g., 3-D, greyscale, and colour is captured. In [20] a 640480 sensor for 3-D and 2-D greyscale is presented. This 3-D-VGA sensor uses a three transistor pixel that is similar to the standard active pixel with the exception of one extra vertical wire. External 8-bit ADCs were used for the 2D greyscale data and they limited the greyscale pixel rate to 4 Mpixels/s. The 3-D
40
Paper I
Table 2: Comparison with other sensors

Process Resolution Pixel pitch Fill factor CDS ADC FPN + temporal noise 3-D prole width 3-D range rate 3-D range resolution 3-D + grey + colour Programmable processing Process Resolution Pixel pitch Fill factor CDS ADC FPN + temporal noise 3-D prole width 3-D range rate 3-D range resolution 3-D + grey + colour Programmable processing Process Resolution Pixel pitch Fill factor CDS ADC FPN + temporal noise 3-D prole width 3-D range rate 3-D range resolution 3-D + grey + colour Programmable processing M12 0.35m 1536512 / 3072n 9.5m / 4.75m 60% Yes 1-10 bits 74e 1536 60 Mrangels/s, 7-bit 1/2-1/16 sub-pixel Yes + additional features 1536 bit-serial, 33MHz 4128 SIMD 0.6m 128128 18m ? Yes Cyclic per 4 columns ? 128 Moderate ? Probably 4128 bit-serial, 20MHz 3-D-RP 0.18m 375365 11.25m 23% No 1-bit High 365 137 Mrangels/s 0.2 sub-pixel No No 4.1M APS 0.35m 23521728 7m 43% (QEFF) Yes 10-bit 65e / 103e No No No No No 3-D-VGA 0.6m 640480 12m 30% No 3-bit TDA for 3-D High 480 20 Mrangels/s <0.1 sub-pixel No No
41
laser triangulation is performed using a fast time-domain approximative (TDA) readout approach. The use of 3-bit TDA ADCs enable the 3-D-VGA sensor to use external gravity centre calculation to reach <0.1 sub-pixel accuracy. In [21] a 375365 sensor for 3-D and binary 2-D is presented. It uses a rowparallel architecture and realizes the very high 3-D range rate of 137 M range values per second. However, this comes at the expense of 24 transistors and a lot of metal routing in each pixel which prohibits small pixels with high FF. A range accuracy of 0.2 sub-pixel is obtained by the use of multisampling. Both these 3-D sensors are intended for 3-D laser triangulation with a scanning mirror and a xed scene and xed camera setup. It is possible to use these sensors in a 3-D setup with a moving object and xed camera and laser, as in Fig. 1. However, since these sensors output one range value per row, contrary to one range value per column as in the presented sensor, these sensors can not simultaneously capture greyscale, or binary, line-scan data in this type of setup and is therefore not suitable for this kind of multisense imaging. The 3-D range rate of the 3-D-VGA is 20 M range values per second which is three times lower than the maximum range rate from the presented sensor. The potential 3-D range rate of the 3-D-RP sensor is 137 M range values per second which is more than twice as fast as the presented sensor. The 3-D proles from the 3-D-VGA sensor contains 480 range values, and the 3-D-RP proles 365, compared to 1536 in the presented sensor. The 3-D-VGA architecture does not permit CDS for the 3-D range nding and is sensitive to transistor and wire capacitance mismatches. The 3-D-RP sensor also suffers from offsets from the pixels and has a low FF. This forces the use of a strong laser, which is not suitable for many machine vision applications due to high cost and eye safety regulations. In [21] a 300 mW laser was not sufcient for running the sensor at the highest possible speed. To conclude the comparison the presented sensor is the only one, known to the authors, capable of high-resolution high-speed multisense imaging.
7 Conclusion
An innovative multiresolution general-purpose high-speed machine vision sensor with on-chip image processing capabilities has been presented. The computa-
42
Paper I
tional power of the SIMD processor array, 100 GOPS, together with the highspeed image sensor part makes the continuous internal pixel-analysing rate as high as 4 Gpixels/s for simple machine vision applications. The sensor can simultaneously capture colour, greyscale, internal material light scatter, and 3-D proles at very high speed. For instance, the sensor can deliver 1536 8-bit range data, 1536 28-bit scatter data, 1536 8-bit colour data per channel, and 3072 8-bit greyscale data 7900 times per second. When running only 3-D laser triangulation, a data rate of more than 20,000 proles/s can be achieved when delivering 1536 range values per prole with 8 bits of range resolution. The work presented shows that it is possible to integrate high-performance programmable digital image processing circuits with a high-speed CMOS image sensor, and still achieve low noise. Measurements show a photo response nonlinearity of 0.09% RMS, global offset FPN of 0.15 LSB RMS, temporal noise of 900 V RMS, and a dynamic range of 62 dB.
Acknowledgment
The authors wish to acknowledge Calle Bjrnlert, Dr. J. Jacob Wikner, Jrgen Hauser, and Magnus Engstrm for implementing special parts of the design. Also Ola Petersson is acknowledged for his work on the sensor evaluation board.
References
[1] R. Forchheimer, and A. dmark, A Single Chip Linear Array Picture Processor, Proc. of SPIE, Vol. 397, pp. 425-430, Jan. 1983. [2] K. Chen, M. Afghahi, P-E. Danielsson, and C. Svensson, PASIC: A Processor-A/D converter-Sensor Integrated Circuit, Proc. IEEE Int. Symposium on Circuits and Systems (ISCAS90), Vol. 3, pp. 1705-1708, May 1990. [3] R. Forchheimer, P. Ingelhag, and C. Jansson, MAPP2200 - A second generation smart optical sensor, Proc. of SPIE, Vol. 1659, pp. 2-11, Feb. 1992.
43
[4] E. R. Fossum, CMOS Image Sensors: Electronic Camera-On-A-Chip, IEEE Trans. Electron Devices, Vol. 44, No. 10, pp. 1689-1698, Oct. 1997. [5] M. Gkstorp, and R. Forchheimer, Smart Vision Sensors, Proc. IEEE Int. Conference on Image Processing (ICIP98), Vol. 1, pp. 479-482, Oct. 1998. [6] M. Johannesson, Fast, programmable, sheet-of-light range nding using MAPP2200, Proc. of SPIE, Vol. 2273, pp. 25-34, July 1994. [7] E. strand, Automatic Inspection of Sawn Wood, Ph.D. thesis No. 424, Linkping University, Sweden, 1996. [8] X. Aragons, J. L. Gonzlez, and A. Rubio, Analysis and Solutions for Switching Noise Coupling in Mixed-Signal ICs, Kluwer Academic Publishers, 1999. [9] H. Tian, B. Fowler, and A. El Gamal, Analysis of Temporal Noise in CMOS Photodiode Active Pixel Sensor, IEEE J. Solid-State Circuits, Vol. 36, No. 1, pp. 92-101, Jan. 2001. [10] J. J. Wikner, Studies on CMOS Digital-to-Analog Converters, Ph.D. thesis No. 667, Linkping University, Sweden, 2001. [11] J. H. Shieh, M. Patil, and B. J. Sheu, Measurement and Analysis of Charge Injection in MOS Analog Switches, IEEE J. Solid-State Circuits, Vol. 22, No. 2, pp. 277-281, April 1987. [12] B. Razavi, and B. A. Wooley, Design Techniques for High-Speed, HighResolution Comparators, IEEE J. Solid-State Circuits, Vol. 27, No. 12, pp. 1916-1926, Dec. 1992. [13] R. Johansson, L. Lindgren, J. Melander, and B. Mller, A MultiResolution 100 GOPS 4 Gpixels/s Programmable CMOS Image Sensor for Machine Vision, in 2003 IEEE Workshop on CCDs and Advanced Image Sensors, Elmau, Germany, May 2003. [14] L. Lindgren, Elimination of Quantization Effects in Measured Temporal Noise, Proc. IEEE Int. Symposium on Circuits and Systems (ISCAS04), Vol. 4, pp. 392-395, May 2004.
44
Paper I
[15] B. Fowler, A. El Gamal, D. Yang, and H. Tian, A Method for Estimating Quantum Efciency for CMOS Image Sensors, Proc. of SPIE, Vol. 3301, pp. 178-185, April 1998. [16] C-S. S. Lin, B. P. Mathur, and M-C. F. Chang, Analytic Charge Collection and MTF Model for Photodiode-Based CMOS Imagers, IEEE Trans. Electron Devices, Vol. 49, No. 5, pp. 754-761, May 2002. [17] I. Shcherback, A. Belenky, and O. Yadid-Pecht, Active Area Shape Inuence on the Dark Current of CMOS Imagers, Proc. of SPIE, Vol. 4669, pp. 117-124, April 2002. [18] A. Krymski, N. Bock, N. Tu, D. Van Blerkom, E. Fossum, A High-Speed, 240-Frames/s, 4.1-Mpixel CMOS Sensor, IEEE Trans. Electron Devices, Vol. 50, No. 1, pp. 130-135, Jan. 2003. [19] H. Yamashita, C. Sodini, A 128128 CMOS Imager with 4128 BitSerial Column-Parallel PE Array, ISSCC Dig. Tech. Papers, Vol. 44, pp. 96-97, Feb. 2001. [20] Y. Oike, M. Ikeda, K. Asada, Design and Implementation of Real-Time 3-D Image Sensor With 640480 Pixel Resolution, IEEE J. Solid-State Circuits, Vol. 39, No. 4, pp. 622-628, April 2004. [21] Y. Oike, M. Ikeda, K. Asada, A 375365 High-Speed 3-D Range-Finding Image Sensor Using Row-Parallel Search Architecture and Multisampling Technique, IEEE J. Solid-State Circuits, Vol. 40, No. 2, pp. 444-453, Feb. 2005.
45
Leif Lindgren was born in Uppsala, Sweden, in 1973. He received the M.Sc. degree in computer science and engineering from Linkpings universitet, Sweden. He is currently pursuing the Ph.D. degree at the Electronic Devices division, Department of Electrical Engineering, Linkpings universitet, Sweden. During 1998, he was a visiting researcher at Stanford University, Stanford, CA. In 1999, he joined IVP Integrated Vision Products AB, Sweden, where he developed smart CMOS image sensors. In 2004, he joined Synective Labs AB, Sweden, where he works with high-performance computing based on FPGA super clusters, and has worked on the worlds fastest commercial IC photomask writer for the 65 nm design node. His research interest is smart CMOS image sensors. Mr. Lindgren received the IEEE ISCAS Sensory Systems Track Best Paper Award in 2004.
Johan Melander (M98) was born in rebro, Sweden, in 1968. He received the M.Sc. degree in computer science and engineering from Linkpings universitet, Sweden, and the Licentiate of Engineering degree from the Electronics Systems division, Department of Electrical Engineering, Linkpings universitet, Sweden. Since 1997, he has been developing smart CMOS image sensors and CMOS camera platforms at SICK IVP AB, Sweden.
46
Paper I
Robert Johansson was born in Nykping, Sweden, in 1973. He received the M.Sc. degree in applied physics and electrical engineering from Linkpings universitet, Sweden. In 2000, he joined IVP Integrated Vision Products AB, Sweden, where he developed smart CMOS image sensors. In 2003, he joined Micron Technology Inc. at their imaging design center in Oslo, Norway.
Bjrn Mller was born in Eskilstuna, Sweden, in 1976. He received the M.Sc. degree in applied physics and electrical engineering from Linkpings universitet, Sweden. In 2000, he joined IVP Integrated Vision Products AB, Sweden, where he developed smart CMOS image sensors. In 2005, he joined Metrima AB, Sweden.
Paper II
Elimination of Quantization Effects in Measured Temporal Noise

L. Lindgren Proc. IEEE Int. Symp. Circuits and Systems (ISCAS04), vol. 4, pp. 392-395, May 2004. The paper received the Sensory Systems Track Best Paper Award.
47

Leif Lindgren Electronic Devices, Linkpings Universitet, SE-581 83, Sweden, lei@isy.liu.se
Abstract This paper presents a mathematical analysis of how temporal noise is transformed by quantization. A new method for measuring temporal noise with a low-resolution ADC and then accurately refer it back to the input of the ADC is shown. The method is, for instance, applicable to CMOS image sensors where photon shot noise is commonly used for determining conversion gain and quantum efciency. Experimental tests have been carried out using a custom designed CMOS image sensor with an on-chip ADC featuring programmable gain and offset. The measurements verify the analysis and the method, e.g. noise levels of 0.11 LSB was measured with an accuracy 30 times higher than a traditional method would give.
1 Introduction
Noise in CMOS image sensors with active pixels consists of xed pattern noise (FPN) and temporal noise. FPN is spatial noise, which is unwanted variations that are consistent for every exposure. Temporal noise, on the other hand, is variations from one image to the next. Temporal noise in darkness, often referred to as read noise, is usually dominated by reset noise in the pixel [1]. Research effort has been made to decrease the reset noise resulting in techniques like soft reset, hard-to-soft reset, active reset, photogates, and buried photodiodes (however, these techniques often come at the expense of increased reset time, image lag, reduced ll factor or the need for special processing steps during manufacturing).
48
49
At high illumination photon shot noise dominates the temporal noise. The RMS value of the photon shot noise equals N where N is the average number of integrated electrons. Due to this relation it is possible to determine N from measurements of the noise relative to the signal. This makes it possible to determine the important image sensor parameters conversion gain, which is the charge to voltage conversion factor, and quantum efciency (QE), which is the number of detected electrons per incident photon [2]. Many CMOS image sensors today have integrated AD-conversion, often with only 8 or 10 bits of resolution. The limited resolution of the ADCs makes lowlevel temporal noise measurements inaccurate when using a traditional method. For instance, in [3] it is stated that a 12-bit ADC was needed to make use of the photon shot noise to determine the sensitivity while only an 8-bit ADC was implemented on the sensor. This paper addresses the issue of measuring temporal noise with a low-resolution ADC. A new method is described which eliminates the quantization effects, thereby, making it possible to measure ADC input referred temporal noise with great accuracy, even if the noise is much lower than the resolution of the ADC. For a CMOS image sensor the method includes the following steps.
i. Make sure the fractional parts of the signals from the pixels are uniformly distributed (see Section 4). ii. Take several consecutive images and calculate the temporal standard deviation for each pixel. iii. Calculate the mean and RMS values of those standard deviations. iv. Use an iteration scheme with (5) on the mean value to obtain the temporal noise referred to the ADC input. v. Use an iteration scheme with (6) on the RMS value to obtain the temporal noise referred to the ADC input. vi. Compare the two results. If they differ the distribution was probably not completely uniform and step 1 should be reviewed if higher accuracy is needed.
50
Paper II
2 Quantization Transformation
Let X be the discrete-time input signal to the ADC with gaussian distributed amplitude and the expected value X . Y is the discrete-time and discrete-amplitude output signal from the ADC. For a specic temporal standard deviation at the ADC input, X , and a specic X it is possible to calculate the temporal standard deviation, Y , and the expected value, Y , at the ADC output. Let the ADC have K number of bits where 2K is equal to I . Y can then be any value in [0, 1, .., I 1]. Let PY i be the probability that Y is equal to i. Quantization then gives P (X 1/2)=FX (1/2) i=0 P (i1/2<X i+1/2)= 0<i<I 1 (1) PY i = =FX (i+1/2)FX (i1/2) P (I 3/2<X )= i=I 1
=1FX (I 3/2)
where FX (x) is the gaussian probability distribution function given by

x
FX (x) =
x 2
(u x )2 2 2x du. e
(2)
Unfortunately FX (x) is not expressible in elementary functions. However, tables of the function are readily available, making it possible to calculate PY i numerically for any given values of X and X . This, in turn, makes it possible to calculate Y and Y according to
I 1
Y =
i=0 I 1
PY i i
(3)
Y =
i=0
PY i (i Y )2 .
(4)
Fig. 1 shows the calculated standard deviation at the ADC output, Y , versus the fractional part of X for eleven values of X . Due to symmetry the curves are not plotted for the fractional part of X being between 1/2 and 1. The fractional part of X , when below 1/2, is equal to the quantization error if X is free from noise. Furthermore, X is chosen so that X is within the range of the ADC,
51
X =1.0
1 0.8 0.6 0.4 0.2 0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.01
Y (LSB)
Fractional part of X (LSB)
0.1
0.2
0.3
0.4
0.5
Fig. 1: Standard deviation at the ADC output versus the fractional part of the mean input signal for temporal noise levels ranging from 0.01 LSB to 1.0 LSB at the input. i.e. P (X 1/2) and P (I 1/2 < X ) become negligible. The gure shows that Y is uncorrelated to the fractional part of X for noise levels above about 1/2 LSB. However, for lower noise levels the correlation is signicant. Fig. 2 shows the calculated Y versus X and X . The gure shows that Y is close to X for X greater than about 0.3 LSB. For lower noise levels at the ADC input Y approaches integer values. This becomes more and more pronounced as X decreases. This means it is not possible, for low noise levels, to determine X just by measuring Y . Instead, the AD-conversion can be seen as a transformation from a 2-D space dened by X and X , to a 2-D space dened by Y and Y . It is, however, not possible to accurately map a coordinate in the Y and Y space to a coordinate in the X and X space for low noise levels. A non-ideal ADC will contain temporal noise from e.g. thermal, 1/f, and shot noise in the transistors within the ADC. This noise is uncorrelated to the noise on the ADC input signal. In reality this means X is actually the square root of the sum of the squared input referred temporal noise generated by the ADC itself and the squared temporal noise on the ADC input signal. For conversion gain and QE measurements this is not a problem. Neither for dynamic range (DR) measurements since DR is commonly dened as the maximum signal divided by
52
Paper II
Y (LSB)
7.5 7 1 0.5 7.5 0 7
X (LSB)
X (LSB)
Fig. 2: Calculated expected value at the ADC output as a function of standard deviation and expected value at the ADC input. the temporal noise in darkness, and the noise from an on-chip ADC should be part of that noise. If it is necessary to measure only the noise at the input of the ADC, and not the ADC input referred noise which includes the noise of the ADC itself, the ADC input referred temporal noise caused by the ADC itself needs to be accurately measured separately and subtracted from the measured ADC input referred noise. This should then also be done using the presented method, but it requires a low-noise ADC input signal and how the uniform distribution is achieved needs to be reconsidered.
3 Uniformly Distributed Fractional Part

The new method for accurately measuring ADC input referred temporal noise with a low-resolution ADC is based on repeated temporal noise measurements where X is changed for each measurement. X is changed so its fractional part is uniformly distributed over the measurements. Each measurement gives a standard deviation at the ADC output. Integrating Y , given by (4), over X with the fractional part of X ranging from 0 to 1, gives the expected mean of Y , Y , given a uniformly distributed fractional part of X . This can be performed for any value of X . Due to symmetry, the integration only needs to be performed when the fractional part of X ranges
53
0.6
Y rms & Y (LSB)
0.5 0.4 0.3 0.2 0.1 0
Y rms Y
0 0.1 0.2 0.3 0.4 0.5 0.6
X (LSB)
Fig. 3: Calculated Y rms and Y versus X given a uniformly distributed X . from 0 to 1/2, which makes the integral correspond to twice the area under the curves in Fig. 1. Trapezoidal approximation gives Y = 1 M
M 1 m (Y (X = m 2M ) + Y (X = 2M )) m=1
(5)
where M is chosen sufciently large to reach a good accuracy. When working with noise, RMS values are often used. The expected RMS value of Y , Y rms , when the fractional part of X is uniformly distributed can be calculated according to (6). Y rms =
M 2 m=1 Y (X 1 m 2 =m 2M ) + Y (X = 2M ) M
(6)
Fig. 3 shows the calculated Y and Y rms for X ranging from 0 through 0.7. The gure shows that both Y and Y rms are signicantly higher than X for low noise levels. For instance, a X of 0.1 LSB results in a Y of 0.16 LSB and a Y rms of 0.24 LSB. To numerically obtain X from a measured Y or Y rms an iteration scheme can be adopted. Both Y and Y rms should be measured and X calculated from both. If the results differ the distribution was probably not completely uniform.
54
Paper II
A simple mistake would be to believe that the ADC adds uncorrelated temporal noise of 1/ 12 to the temporal noise on X . This is however incorrect which can be seen in Fig. 3 where Y rms and Y are close to zero, and not 1/ 12, for small X . The reason for it to be incorrect is that the well known quantization error of 1/ 12 is the RMS value of the difference between X and Y when X is uniformly distributed, while temporal noise is dened as the temporal standard deviation of X and Y respectively.
4 Achieving Uniform Distribution

For CMOS image sensors the inherent FPN is used to achieve a near uniform distribution. FPN causes the pixels to have different X , thereby, causing a distribution of the fractional part of X among the pixels. The more uniform distribution the higher measurement accuracy. FPN is often modeled with an offset and a gain component. Both of these often have a high frequency (HF) and a low frequency (LF) spatial behavior. The LF FPN is usually not considered to be a problem since its impact on the images is usually not noticeable by the human eye. Furthermore, LF FPN seldom affects FPN numbers which are often only measured using the center part of the sensor, e.g. the ISO standard for noise measurements in electronic still picture imaging requires only the 64 64 center pixels to be used [4]. However, the LF FPN can cause a near uniform distribution among the pixels. The LF FPN can originate from gradients in the wafer, such as transistor threshold variations causing offset FPN and variations in the photodiode capacitance causing gain FPN. The HF FPN, on the other hand, tend to give a more gaussian like distribution and it is due to local variations. Correlated double sampling, CDS, greatly reduces the offset HF FPN while it does not reduce gain FPN. Depending on the CDS implementation LF offset FPN may also be reduced, e.g. differential CDS with double delta sampling connected to a global ADC typically eliminates LF offset FPN while a single ended CDS connected to a column based ADC can cause LF offset FPN. CMOS image sensors can have an LF offset FPN in the form of a predictable horizontal gradient coming from imperfect sampling of reference voltages in the pixel or in column based circuits for CDS. The reason for the imperfect sampling can be the fact that channel charge injection and clock feedthrough (CCI/CFT)
55
is dependent on the fall time of the transistor gate signal and the impedance at the reference node, or that CCI/CFT from early columns affects the reference voltage. In CMOS image sensors there are often long horizontal control signals driven from one side of the chip that turn off a transistor in each pixel or column based CDS circuit. These signals suffer from long RC delays causing a difference in fall time among the columns. This can be used in a special mode of operation. Having the pixel reset transistor on when reading the reset value, contrary to turning it off, will cause an offset gradient due to CCI/CFT from the pixel reset at the beginning of the light integration. This mode will, however, result in lower reset noise due to LP ltering via the source follower [5] and is therefore more suitable for measuring conversion gain and QE than the sensor read noise. If the sensor does not have a built in gradient, a gradient can be created by having a weak light source illuminating one side of the sensor more than the other one. Another way to achieve a near uniform distribution is to have an ADC with a reference that in a controlled manner can be moved relative to the signal. The reference is then changed between each measurement.
5 Experimental Results
Extensive measurements have been carried out using a custom designed CMOS image sensor previously presented in [5]. The sensor has 1536 512 standard three transistor active pixels with a pitch of 9.5 m, and it has an 8-bit on-chip ADC with 8-bit programmable gain and offset. The transfer characteristics of the ADC are very good, with a DNL of 0.06 LSB RMS and 0.50 LSB max, and an INL of 0.08 LSB RMS and 0.45 LSB max. The sensor features several alternative modes for the column based CDS circuits, some of which are designed to cause a horizontal offset gradient. Fig. 4 shows the measured column mean in darkness for three of these modes. In the gure the columns have been grouped together, in groups of 24, to better show the gradients. Furthermore, a dithering scheme using ten ADC offset settings were used in this measurement to give high accuracy conversions. One of the curves shows a gradient that stems from the pixel reset. Another curve shows a gradient that stems from the sampling of a horizontally routed reference voltage in the CDS circuits. The third curve is the intended readout mode for the sensor and it gives no gradient at all.
56
Paper II
8 7.8 Column mean (LSB) 7.6 7.4 No gradient 7.2 7 6.8 6.6 6.4 0 CDS gradient 500 Column 1000 1500 Pixel gradient
Fig. 4: Column mean measured in dark for three readout modes. The ADC gain was here set to 3.89 mV/LSB. Five different ADC gain settings have been used for the temporal noise measurements, ranging from 2.20 mV/LSB to 6.37 mV/LSB. An integrating sphere and a stabilized DC light source provided uniform illumination of the sensor. To get different amounts of noise at the ADC input 13 different integration times were used. Longer integration time means more incident photons, resulting in higher photon shot noise. For each of the integration times 20 different offset settings have been used for the ADC, each step moving the ADC reference 1.45 mV. 300 images were captured for each offset setting and 60 rows were observed in the images. The temporal standard deviation was measured individually for all the 92160 observed pixels. With the 20 offset settings this yields more than 1.8 million standard deviations at the ADC output for each gain setting at each integration time. Fig. 5 shows the RMS value of those standard deviations, Y rms , expressed as voltage versus mean output signal. From this gure it is clear that the magnitude of the quantization steps of the ADC affects the measured noise; smaller steps result in lower measured noise. To verify the relative behavior of Y rms and Y both of these have been measured and the Y rms over Y ratio has been calculated. This has been done for all 13 integration times and all ve gain settings, and the resulting 65 values have been marked out in Fig. 6. The theoretical ratio, i.e. (6) divided by (5), has also been
57
3.5 3
Y rms (mV)
2.5 2 2.20 mV/LSB 3.02 mV/LSB 3.89 mV/LSB 5.08 mV/LSB 6.37 mV/LSB 0 0.1 0.2 0.3 0.4 0.5
1.5 1 0.5
Y (V)
Fig. 5: Measured Y rms for 5 ADC gain settings at 13 integration times.
1.5 1.4 1.3 1.2 1.1 1 0.9
Y rms /Y
2.20 mV/LSB 3.02 mV/LSB 3.89 mV/LSB 5.08 mV/LSB 6.37 mV/LSB Calculated
0.2
0.4
Y (LSB)
0.6
0.8
1.2
Fig. 6: Measured and theoretical Y rms over Y ratio.
58
Paper II
3.5 3 2.5 2 2.20 mV/LSB 3.02 mV/LSB 3.89 mV/LSB 5.08 mV/LSB 6.37 mV/LSB 0 0.1 0.2 0.3 0.4 0.5
X (mV)
1.5 1 0.5
Y (V)
Fig. 7: Measured temporal noise referred back to the input of the ADC. calculated and the curve is also plotted in the gure. The measured data are right on, or very close to, the theoretical curve. Fig. 7 shows the measured noise when referred back to the ADC input using the method described in the previous sections. The gure shows much smaller differences between the ve gain settings compared to Fig. 5. E.g. for the shortest integration time used, i.e. the smallest value on Y , a temporal noise of 948 V to 1.607 mV was measured for Y rms , see Fig. 5, versus 696 V to 711 V when referred back to the ADC input, see Fig. 7. For the 6.37 mV/LSB gain setting this corresponds to a X of 0.11 LSB and a suggested accuracy of 0.0024 LSB ((.711.696)/6.37). This is about 60 times higher accuracy than if Y rms was used as a direct estimate of X , and 30 times higher than if Y was used.
6 Conclusion
With the presented method it is possible to accurately measure temporal noise with a low-resolution ADC. For a CMOS image sensor this means it is possible to accurately determine conversion gain, quantum efciency, and ADC input referred read noise, even if the temporal noise is much lower than the resolution of the on-chip ADC.
59
7 Acknowledgment
The author wishes to acknowledge Johan Melander, Robert Johansson, and Bjrn Mller for helpful discussions. IVP Integrated Vision Products AB and Electronic Devices, Linkpings Universitet, are acknowledged for partial funding.
References
[1] O. Yadid-Pecht, B. Mansoorian, E. R. Fossum, and B. Pain, Optimization of Noise and Responsivity in CMOS Active Pixel Sensors for Detection of Ultra Low Light Levels, Proc. SPIE, Vol. 3019, 1997, pp. 125-136. [2] B. Fowler, A. El Gamal, D. Yang, and H. Tian, A method for estimating quantum efciency for CMOS image sensors, Proc. SPIE, Vol. 3301, January 1998, pp. 178-185. [3] D. Yang, A. El Gamal, B. Fowler, and H. Tian, A 640 512 CMOS Image Sensor with Ultrawide Dynamic Range Floating-Point Pixel-Level ADC, IEEE J. Solid-State Circuits, Vol. 34, No. 12, December 1999, pp. 18211834. [4] ISO/FDIS 15739:2002(E), Photography - Electronic still picture imaging Noise measurements. [5] R. Johansson, L. Lindgren, J. Melander, and B. Mller, A Multi-Resolution 100 GOPS 4 Gpixels/s Programmable CMOS Image Sensor for Machine Vision, in 2003 IEEE Workshop on CCDs and Advanced Image Sensors, Elmau, Germany, May 2003.
Paper III
A New Simultaneous Multislope ADC Architecture for Array Implementations

L. Lindgren
61

Leif Lindgren Electronic Devices, Linkpings universitet, SE-581 83, Sweden, lei@isy.liu.se
Abstract This paper presents a new simultaneous multislope ADC architecture suitable for array implementations in, e.g., CMOS image sensors (CISs). The simplest implementation is almost twice as fast as a conventional slope ADC, while it requires only a small amount of extra circuitry. Measurements have been performed on a custom made CIS which implements parts of the proposed ADC. The measurements show good linearity and verify the concept of the new architecture.
Index Terms ADC, A/D-conversion, CMOS image sensors, simultaneous multislope, single slope, SMS.
1 Introduction
In general there exist three architectures for on-chip ADC for CMOS image sensors (CISs). They are one single ADC for all pixels, a column parallel ADC, and one ADC per pixel. Also some derivatives of these exist. A drawback of the single ADC is the speed required for high-speed imaging. The pixel level ADC can handle very high speed, however, it drastically decreases the ll factor. For highspeed applications the column parallel architecture is often chosen and it is the
62
63
architecture studied in this paper. Three different types of column parallel ADCs have often been used for CISs. They are cyclic ADC, successive approximation (SA) ADC using switched capacitor charge sharing, and single slope ADC. Drawbacks with the cyclic converters are size, speed, resolution, and problems with charge injection from switches. The cyclic converters use at least one operational transconductance amplier (OTA), one comparator, and two linear capacitors per column [1]. One converter can, typically, not t into the pixel pitch, e.g., in [2] each converter was four pixel columns wide. Furthermore a central voltage generator needs to charge capacitors in each column, and the OTA and comparators need to settle each cycle. For sensors with many columns this severely limits the clock frequency and, thereby, the conversion rate. Fully differential solutions can be used to reduce the problems with charge injection and an accuracy of about 9 bits was then reported in [2], but this adds circuitry and power. CISs with column parallel SA ADCs using charge redistribution have been successfully demonstrated several times. It requires a comparator, n bits of memory, and n binary scaled capacitors per ADC channel, where n is the number of bits. For keeping column to column xed pattern noise (FPN) low a calibration circuit is also needed in each column. In [3] a 10-bit SA ADC is used and the calibration circuit is implemented as a 7-bit capacitor bank. The main advantage of this type of ADC is speed. Drawbacks are size and the need for matched capacitor ratios, which can limit the accuracy. To get good linearity the layout of the capacitors is crucial and an ADC typically requires the width of two columns and becomes several mm high. Single slope converters have often been used for CISs partly because they only need one comparator and n bits of memory per channel. This makes it easy to t one channel into one column, e.g., in [4] each column was only 5.6 m in a 0.6 m process. This is very important for smart vision sensors having a column parallel SIMD processor on-chip [5], [6]. Another reason is that it is fairly simple to get good accuracy. High linearity is realized by having a good global voltage slope generator, typically implemented as a counter and a DAC. Then, the comparators should have low temporal noise, and be offset compensated and have moderate delay variations for keeping FPN low, e.g., in [4] FPN was less than 0.1 mV. For CISs the possibility of high resolution is becoming more important due to the increased dynamic range offered by buried photodiodes [7]. Furthermore, the single slope ADC permits multiple thresholding, an operation important in many machine vision applications. A drawback is speed since one conversion requires 2n clock cycles. It is, however, possible to trade accuracy or signal swing
64
Paper III
for higher speed simply by controlling the counter feeding the DAC. The clock frequency can also be much higher than for the cyclic and SA ADCs, in [6] a 33 MHz clock was used. This is because nothing needs to settle each cycle since the delay from the DAC and the comparators only adds a DC-offset which, if needed, can easily be compensated for. This paper introduces a new column parallel ADC architecture, named simultaneous multislope (SMS), suitable for, e.g., CISs. It is a further development of the single slope ADC, but contrary to the single slope it uses several slopes simultaneously thus greatly increasing the conversion rate. Advantages of the single slope like small size, high accuracy, and multiple thresholding are kept.
2 Architecture
The proposed SMS ADC architecture works like a column parallel single slope ADC except that it has one comparison phase and one slope phase. Furthermore, several slopes are used in parallel during the slope phase, where each slope covers a part of the total signal swing. Compared to the single slope ADC some control circuitry and an analog multiplexer at the comparator input is added in each column. All slopes are distributed to the analog multiplexers, enabling each column to select which slope to connect to the comparator input. A conversion starts with the comparison phase where m1 number of consecutive threshold operations are performed, where m is the number of slopes used. The slopes are set to their initial value and then the analog multiplexers are set so the signal in each column is compared to slope number 2 through m respectively. These threshold operations determine which of the slopes each column is to use during the following slope phase. During the slope phase the actual slope operation is performed where the output of a counter is fed to the slope generators and to the columns. The counter value is latched in a column when the comparator changes state. The digital result in each column is then given by a combination of the threshold result bits and the latched counter value. The simplest example is the case when two slopes are used, which is depicted in Fig. 1. Here a traditional single slope converter would use an extended (dotted) slope1 and use 2n counter steps to cover the entire voltage range, while the new SMS converter uses slope1 for the lower voltage range and slope2 for the upper
65
slope3
slope2
slope1 0 2n-1-1 2n-1
Fig. 1: Slopes for a two slope SMS ADC (1 and 2), and two slope SMS ADC with a CS DAC (1 and 3). voltage range and needs only 2n1 counter steps. For an n-bit ADC using two slopes, one of the slopes covers the signal range corresponding to the digital values [0, 1, ..., 2n1 1] while the other slope covers [2n1 , 2n1 +1, ..., 2n 1]. The threshold operation is performed by comparing the input signals to the lowest value of the second slope (or the max value of the rst slope depending on implementation). If the input signal in a column is higher than the DAC signal a 1 is stored in the MSB of that column and the second slope will be used during the slope operation. If the input signal instead is lower than the DAC signal a 0 is stored in the MSB and the rst slope will be used during the slope operation. The slope operation then determines the rest of the bits.
2.1 Using Current Steering DACs

An efcient way to implement the proposed architecture is to use a current steering (CS) DAC for the slope generation. In a CS DAC there exist a number of current sources; 2n in a thermometer coded, n in a binary weighted, and something in between in the case of a segmented DAC. A voltage is generated at the output via a load resistor or by using an opamp with a resistive feedback. For good dynamic performance the current sources not connected to the output are connected to ground or to a dummy resistor. By using this dummy output a slope with the opposite sign to the original output is created at virtually no cost. Thereby, two slopes are generated by one CS DAC. This is shown by slope1 and slope3 in Fig. 1 where the voltage range is covered after 2n1 counter steps.
66
Paper III
When using both outputs from a CS DAC the bits obtained from the slope operation need to be handled in a new way, since the slopes have different sign. When the negative slope is used the latched counter bits need to be transformed to their corresponding ADC output value. This is simply done by inverting these bits. For the two slope SMS example this means that the MSB, which is obtained in the threshold operation, determines if the rest of the bits are to be inverted or not. If the MSB was 1 the negative slope was used and the rest of the bits should be inverted. This operation is performed using a simple XOR operation for each latched counter bit. For sensors with a column parallel SIMD processor on-chip this is easily handled in each column. For other sensors this only needs to be performed when reading out the digital data which is, typically, done one column at a time. This means only one central unit with n1 number of XOR gates is needed for this operation. Compared to a single slope ADC using a CS DAC, this two slope SMS ADC is almost twice as fast and only a small amount of extra circuitry is needed.
2.2 Fast Pseudo Conversion

Noise in a CIS is higher for highly illuminated parts of the image. This because photon shot noise grows as k, where k is the number of electrons, and because conversion gain variations among the pixels causes gain FPN. This means the requirements for low quantization noise decreases as the signal increases. This can be utilized by using larger quantization steps for the upper part of the signal range. In [8] a 10-bit pseudo conversion was realized in 397 clock cycles by using 10-bit steps for the lower part of the signal range and then increasing the stepsize to 8-bit steps. In [6] this technique was referred to as Fast Pseudo Conversion (FPC) and the counter step-size was doubled after reaching the value 64, resulting in 38% shorter conversion time. It is possible to build SMS ADCs that also use FPC by having different gains on different slopes. The simplest case is when two slopes are used and the second slope has, e.g., twice or four times as high gain as the rst. Having the gain of the second slope twice as high means the conversion only needs 2n /3 counter steps, and having it four times as high means 2n /5 counter steps. One way to implement this is to have a CS DAC with a load resistor with twice as high resistance for the second slope. The digital result from a column using the second slope is then obtained by inverting the latched counter bits, hard-wire shift them one step, and then add 2n /3+1.
67
Table 1: Example congurations for an ADC using two or four slopes. Slope1 n n n n n n n n n n Slope2 n n-1 n-2 n n n n n-1 n n-2 Slope3 Slope4 Cycles 2n /2 2n /3 2n /5 2n /4 2n /6 2n /8 2n /10 2n /15 2n /22 2n /25 10b 512 341 204 256 170 128 102 68 46 40 12b 2048 1365 819 1024 682 512 409 273 186 163
n n-1 n-1 n-1 n-2 n-2 n-2
n n-1 n-2 n-2 n-3 n-4 n-4
Table 1 describes some possible congurations for SMS FPC converters using two or four slopes. The rst columns describe the resolution from the different slopes where, e.g., n1 means the slope has twice as high gain as the rst slope. The last three columns describe how many clock cycles that are needed for one conversion. E.g. the 8th row has n, n1, n2, and n3 for the different slopes. For a 12-bit FPC converter this would mean 212 /(1+2+4+8)=273 clock cycles to perform the conversion, and the step-sizes of the four different slopes correspond to the step-sizes from a 12-bit, 11-bit, 10-bit, and 9-bit converter respectively. There is a wide variety of ways to implement SMS FPC converters. Fig. 2 shows an example of a block diagram for the generation of four slopes. The reference voltages v1, v2, v3, and v4, can be used to move the offset coarsely, and can simply be generated with, e.g., a resistor string and bypass capacitors. For ne tuning the offsets each slope has its own bank of current sources (though, this may not be needed for the rst slope). For high accuracy the size of these current sources should be smaller than the corresponding slope sources. To get very low differential non-linearity (DNL) where the slopes meet the inherent delay of the slopes and comparators also needs to be taken into account, when using both positive and negative slopes. This is done by slightly increasing the starting point of a positive slope and decreasing the starting point of a negative slope, using the offset programmability, after a static ne tuning has been performed. The number of columns connected to a certain slope during the slope operation depends on
68
Paper III
encoder
v1
R1 slope1 R2 slope2 R3 v3 v4 v2 slope3 R4 slope4
offset sources 1 offset sources 2 slope sources 1 2 slope sources 3 4 offset sources 3 offset sources 4
Fig. 2: Block diagram of circuit generating four slopes. the signals in the row that is currently converted. This means the actual load on a slope depends on the image content. For this trimming to work well the load dependent delay from the slopes should, therefore, be kept small. All current source banks in Fig. 2 share a common digital encoder. A control block inputs the desired setting to this encoder and then clocks latches in the current source bank that is to use this new setting. There are many different current steering coding architectures available, e.g., binary-weighted, thermometer coded, segmented, and decomposed. For the 12-bit FPC example a 7-bit segmented approach would mean that the 7 MSBs in the code word from the controller are thermometer coded, while the 2 LSBs are either binary coded or thermometer coded. For each bank this would mean 68 current sources set by the 7 MSBs, and these sources would be four times larger than the unit current source controlled by the LSB. Having the 2 LSBs thermometer coded would mean a total of 71 current sources per bank and a total of 355 latches for all six banks together. The 12-bit FPC converter in the example could be realized by having R2=2R1, R3=4R1, and R4=8R1 in Fig. 2. Another way would be to have R2=2R1, R3=R1, and R4=2R1 and then increase the current sources in bank slopesources34 to be four times the size of the ones in the bank slopesources12.
3 ADC Characterization
An ADC suffers from both dynamic and static errors. For a CIS ADC the dynamic errors are characterized as temporal noise, while the static errors include integral non-linearity (INL) and DNL. The static performance is often measured
control
69
as max INL and max DNL. For an ADC in a CIS with active pixels these are, however, rather bad measures. This because the requirements on the ADC decrease with increased signal due to the fact that the sensor noise increases with the signal. This means, e.g., that a high DNL peak occurring in the high part of the input signal range, like for the middle curve in Fig. 3, would mean a high max DNL, but it would not have any big effect on the total error since the temporal noise and the gain FPN is already quite high in this signal range. Another reason for these measures to be bad is that a very low INL of the ADC is not needed since the charge to voltage conversion is slightly non-linear. This due to the voltage dependent capacitance of the photodiode, and because the gain of the source follower in the pixel is also slightly voltage dependent. This non-linearity in the sensor signal chain means low-frequency behavior of the measured INL vector is not a big concern. When characterizing ADCs for CISs the entire INL and DNL vectors should be studied. If the measured INL vector contains low-frequency components this could, for evaluation purposes, simply be removed by high-pass ltering the vector. A RMS value of the DNL vector, or of a high-pass ltered INL vector, can be calculated and seen as extra quantization noise. This makes it easier to compare the accuracy of ADCs with different resolution, a possibility desired in, e.g., [9]. Furthermore, when characterizing a multi-channel ADC, like a column parallel, each ADC channel has its own static transfer curve. Deviations between the channels are characterized as FPN.
4 Experimental Test
4.1 Setup and Calculations
Measurements have been carried out using a custom made CIS previously presented in [6]. The sensor has 1536512 standard three transistor active pixels with a pitch of 9.5 m, and was implemented in a 0.35 m process. The sensor has a column parallel slope ADC with programmable resolution, gain, and offset. Furthermore, it has a multiplexer at each comparator input making it possible to perform A/D conversion using either a positive or a negative slope. However, the slopes cannot be used simultaneously because the selection of the slopes is done globally for all columns. The new SMS ADC architecture can, however, be
70
Paper III
mimicked on-chip, making it possible to measure parameters like INL, DNL, and FPN. The SMS ADC is mimicked by rst performing a threshold operation that determines which of the two slopes that should be used by each column. To get low DNL at the midpoint during the dynamic slope operation the offset programmability is then used to compensate for the delay in the DAC and the comparators. Then a conversion with the positive slope is performed and the results are stored in the on-chip column memory. Another conversion is then performed with the negative slope. Since both slopes here covers the entire signal swing the results need to be limited. This is done by the on-chip SIMD processor by setting the results that exceeds 127 from the rst conversion to 127, and the results from the second conversion being below 128 are set to 128. The on-chip SIMD processor then uses the threshold bit obtained in each column to multiplex the results from the two conversions. It was possible to characterize the ADC due to the possibility to feed an analog value from an external 14-bit DAC to the input of the internal ADC. To reduce the temporal noise 64 of the above type of conversion are made for each input voltage level. The results are rst accumulated in each column by the SIMD processor and then sent to a PC.
4.2 Results
The internal DAC was characterized in [6] with good results. In 8-bit mode and a swing of 1 V a DNL RMS of 0.04 LSB, a DNL max of 0.12 LSB, an INL RMS of 0.04 LSB, and an INL max of 0.13 LSB were measured. The ADC DNL and INL were measured individually for all 1536 columns using 8-bit mode and a swing of 1 V, and the results for all columns were very similar. Fig. 3 shows the measured DNL vector for the ADC in one of the columns for a single slope conversion using the positive slope, a single slope conversion using the negative slope, and a mimicked SMS conversion. The DNL measurements show that the very rst few steps of an A/D conversion were smaller than the rest of the steps. It is believed that this is due to the delay of the internal DAC and the comparators becoming settled after a few counter steps [6]. A simple solution would, in that case, be to have a few dummy steps in the beginning of the slope
71
DNLp (LSB)
0 0.5 0 50 100 150 200 250
DNLn (LSB)
0 0.5 0 50 100 150 200 250
DNLsms (LSB)
0 0.5 0 50 100 150 200 ADC output (LSB) 250
Fig. 3: Measured DNL for positive (top), negative, and SMS (bottom) conversion. operation. In reality the smaller steps actually gives higher resolution for the rst gray levels, resulting in higher dynamic range but lower linearity. The bottom curve in Fig. 3 shows that the SMS operation works as intended and no large DNL errors occur at the midpoint. Furthermore, Table 2 shows the measured DNL for the SMS conversion to be close to what was measured for the single slope conversion, and the last two columns (SSc and SMSc) show that very good linearity is achieved when a few samples in the beginning and at the end are disregarded. Fig. 4 shows the measured INL vector for the ADC in one of the columns for a single slope conversion using the positive slope, a single slope conversion using the negative slope, and a SMS conversion. The INL vector for the SMS conversion looks as expected. These INL plots are greatly affected by the smaller ADC steps described above. Since the solution to this problem seems simple, Fig. 5 shows the measured INL vectors when disregarding a few samples in the beginning and at the end. INL then becomes almost as low as for the DAC, see Table 2, and the SMS conversion does not degrade the linearity compared to the single slope conversions.
72
Paper III
Table 2: Measured DNL/INL (average of all columns) and FPN in LSB for single slope (SS) and SMS conversions. Parameter DNL RMS DNL max INL RMS INL max FPN FPN 2 SS 0.06 0.50 0.08 0.45 0.1317 0.1328 SMS 0.06 0.52 0.08 0.48 0.1322 0.1335 SSc 0.04 0.12 0.05 0.13 0.1313 0.1320 SMSc 0.04 0.13 0.05 0.13 0.1318 0.1328
INLp (LSB)
0.5 0 0.5 0 50 100 150 200 250 0.5 0 0.5 0 50 100 150 200 250 0.5 0 0.5 0 50 100 150 200 ADC output (LSB) 250
Fig. 4: Measured INL for positive (top), negative, and SMS (bottom) conversion.
INLsms (LSB)
INLn (LSB)
73
INLp (LSB)
0.1 0 0.1 0 50 100 150 200 250
INLn (LSB)
0.1 0 0.1 0 50 100 150 200 250
INLsms (LSB)
0.1 0 0.1 0 50 100 150 200 ADC output (LSB) 250
Fig. 5: Measured INL with disregarded samples at the beginning and end for positive (top), negative, and SMS (bottom) conversion.
ADC input referred FPN should not be affected by the SMS architecture. However, any DNL errors introduced in the crossover between two slopes would affect the quantization transformation of the ADC input referred FPN. For each counter value of the external DAC, ADC FPN was measured as the standard deviation among the ADC channels, see Fig. 6. From the sinus-like shape of the middle FPN curve in Fig. 6 it is evident that the limited resolution of the quantization greatly affects the calculated FPN [10]. This is because the mean output value from each ADC depends on the mean input value and the temporal noise referred to the input, see [10]. It is possible to accurately refer the FPN back to the input of the ADC by modifying the method in [10]. This is done by rst rounding the digital result in each column. FPN is then calculated on these rounded results, see bottom plot of Fig. 6. As expected the peaks of the FPN curve now reaches 0.5 LSB. After rst making sure there are no gradients among the columns the distribution is assumed to be Gaussian. This makes it possible to apply the equations in [10] on the mean and RMS value of the new FPN vector and, thereby, get the ADC input referred FPN. The second last row in Table 2 shows input referred FPN calculated using the mean value, and the last row when using the RMS value.
74
Paper III
FPN (LSB)
0.5
0 FPN (LSB) 0.5
1000
2000
3000
4000
FPNround (LSB)
0 1940 0.5
1960
1980
2000
2020
0 1940
1960 1980 2000 External DAC counter
2020
Fig. 6: Measured FPN vs. counter value for the external DAC for SMS conversion.
5 Conclusion and Discussion
This paper presented a new simultaneous multislope ADC architecture suitable for array implementations in, e.g., CISs. It is a further development of the single slope ADC and it increases the conversion rate signicantly at a low cost in extra hardware. Measurements on a custom made CIS veried the concept of the new architecture.
Assuming a sensor with the HDTV resolution of 19201080 pixels, 2/3" optics, a pixel pitch of around 5 m, and a 0.18 m process, then, for such a process and sensor size, a 66 MHz ADC clock frequency is feasible. The example 12-bit pseudo converter using four slopes would then realize almost 200 frames/s. The architecture also permits having two converters per column, placed at the top and bottom similar to [9], which would mean almost 400 frames/s.
75
References
[1] K. Chen and C. Svensson, A Parallel A/D Converter Array Structure with Common Reference Processing Unit, IEEE Trans. Circuits Syst., vol. 36, no. 8, pp. 1116-1119, Aug. 1989. [2] S. Decker, R.D. McGrath, K. Brehmer, and C.G. Sodini, A 256 256 CMOS Imaging Array with Wide Dynamic Range Pixels and ColumnParallel Digital Output, IEEE J. Solid-State Circuits, vol. 33, no. 12, pp. 2081-2091, Dec. 1998. [3] A. I. Krymski, N. Bock, N. Tu, D. Van Blerkom, and E. Fossum, A HighSpeed, 240-Frames/s, 4.1-Mpixel CMOS Sensor, IEEE Trans. Electron Devices, vol. 50, no. 1, pp. 130-135, Jan. 2003. [4] T. Sugiki et al., A 60mW 10b CMOS Image Sensor with Column-toColumn FPN Reduction, in IEEE ISSCC Dig. Tech. Papers, pp. 108-109, 2000. [5] K. Chen, M. Afghahi, P-E. Danielsson, and C. Svensson, PASIC: A Processor-A/D converter-Sensor Integrated Circuit, Proc. IEEE Int. Symp. Circuits and Systems, vol. 3, pp. 1705-1708, May 1990. [6] L. Lindgren, J. Melander, R. Johansson, and B. Mller, A Multiresolution 100-GOPS 4-Gpixels/s Programmable Smart Vision Sensor for Multisense Imaging, IEEE J. Solid-State Circuits, vol. 40, no. 6, pp. 1350-1359, June 2005. [7] I. Inoue et al., Low-Leakage-Current and Low-Operating-Voltage Buried Photodiode for a CMOS Imager, IEEE Trans. Electron Devices, vol. 50, no. 1, pp. 43-47, Jan. 2000. [8] W. Yang, O-B. Kwon, J-I. Lee, and G-T. Hwang, An Integrated 800x600 CMOS Imaging System, in IEEE ISSCC Dig. Tech. Papers, pp. 304-305, 1999. [9] A. I. Krymski and N. Tu, A 9-V/Lux-s 5000-Frames/s 512 512 CMOS sensor, IEEE Trans. Electron Devices, vol. 50, no. 1, pp. 136-143, Jan. 2003. [10] L. Lindgren, Elimination of Quantization Effects in Measured Temporal Noise, Proc. IEEE Int. Symp. Circuits and Systems, vol. 4, pp. 392-395, May 2004.

CMOS Imagers Thesis (Linkoping)

Transféré par

Informations du document

Description originale:

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

CMOS Imagers Thesis (Linkoping)

Transféré par

Droits d'auteur :

Formats disponibles

Linkping Studies in Science and Technology Thesis No.

Topics on CMOS Image Sensors

History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CMOS Image Sensor Circuitry

Chapter 2. 2.1 2.2

Paper II. Noise 1 2 3 4 5 6 7

Experimental Test . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 4.2 Setup and Calculations . . . . . . . . . . . . . . . . . . Results . . . . . . . . . . . . . . . . . . . . . . . . . .

Conclusion and Discussion . . . . . . . . . . . . . . . . . . . .

an incomig photon polysilicon gate metal layer 1 via1 via2

ADC Digital out

CMOS Image Sensor Circuitry

Chapter 2. CMOS Image Sensor Circuitry

Column Bus Row Select

Parallel ADC Column select

out n+ n-well photogeneration region p-sub photogeneration region p+

out n+ photogeneration region p-sub

Fig. 2.2: Different types of photodetectors in a CMOS process.

Chapter 2. CMOS Image Sensor Circuitry

2.2 Pixel and Readout Circuits

2.2.1 Passive Pixels

2.2 Pixel and Readout Circuits

row select c1 photodiode column bus s2

Chapter 2. CMOS Image Sensor Circuitry

2.2.2 Active Pixels

2.2 Pixel and Readout Circuits

Vdd reset row select

Chapter 2. CMOS Image Sensor Circuitry

Vdd row select

2.2.3 Logarithmic Pixels

2.2.4 Global Shutter

2.2 Pixel and Readout Circuits

Vdd snap reset n1

Vdd row select

Fig. 2.6: 4-transistor shuttered pixel.

Fig. 2.7: 5-transistor shuttered pixel.

Chapter 2. CMOS Image Sensor Circuitry

Camera with newly developed sensor

3-D triangulation illumination

Scatter illumination Colour and greyscale illumination

HiRes rows field-of-view Object Camera field-of-view

Column HiRes rows (3072 pixels each)

1536 x 512 Array

1536 Columns Analogue readout and AD-conversion Processor and registers

Fig. 2: Chip block diagram.

3.2 Multiresolution Sensor Area and Analogue Readout

3.3 A/D Conversion

PD ADREG 8 8b counter HiRes GLU

vramp COUNT/GOR comp 96 RREG PD SREG/dataport 16

3.4 Processor and Registers

4.1 Mixed-Signal Aspects

4.2 Large Distance Signalling

4.3 GLU, COUNT, and GOR

4.4 Pixels and Analogue Readout

4.5 A/D Conversion

column line C2 g0 g1 C1 vdac255 ng si3 si2

sr4 sr5 sr3 C3 si4 1 vrampn 0

Fig. 6: Measured DNL and INL for the internal DAC.

100 150 200 ADC output (LSB)

Fig. 7: Measured DNL and INL for the internal ADC.

100 150 Grey Level (LSB)

4 3.5 Temporal RMS noise (mV) 3 2.5 2 1.5 1 0.5

0.2 0.4 0.6 0.8 Mean ADC input signal (V)

Fig. 12: Zoomed in BGA 3-D scan.