DICTA2008 Ruijia (Fred) Feng

A Real-Time Adaptive Learning Method for Driver Eye Detection
Zhang Guang-yuan, Cheng Bo*, Feng Rui-jia, Zhang Xi-bo State Key Laboratory of Automotive Safety and Energy, Tsinghua University, Beijing, China Abstract
This paper presents an adaptive learning method for real-time driver eye detection, which could be used for monitoring a drivers vigilance level while he/she is operating a vehicle on road. To adapt to the variances in the eye shape and size of different individuals, a learning mode is introduced at the early stage of eye positioning to build the sample learning library. Face detection is firstly performed to narrow the search region. Then contour detection and heuristic rules are used to identify the Region Of Interest (ROI) of the eye. By learning the eye region, a set of images that satisfy the pre-set rules are obtained to form the eye templates. When the number of successful learning exceeds a pre-set threshold, the algorithm switches to the non-learning mode, in which the template matching and a distance factor are used for eye detection. By extracting the skeleton curve and the corner of the eye, the closing level of the eye is calculated which could be used as indicators to the drivers vigilance level. The validation results show that the method could achieve a high level of accuracy. expression to reflect the mental status [5]. Ueno et al. described a method using eye open level to detect drowsiness [6]. Other commonly used parameters also include the blinking interval, pupil position, etc [7-9]. In developing those driver monitoring systems, a reliable real-time driver eye detection method is one of the essential parts. In developing the driver eye detection method, we use a driving simulator and a CCD camera mounted at the back of the steering wheel for capturing the drivers facial image. The driver was asked to drive normally on a circular course with a constant speed of 80 km/h. The driving simulator and scenario in the test are shown in Figure 1.
CCD Camera
1. Introduction
Driver drowsiness is one of the major causes of traffic accidents on road. Monitoring the drivers vigilance level, and issuing an alert when he/she is not paying enough attention to the road is a promising way to reduce the accidents caused by driver factors. Thus it could become an important part in the development of the advanced safety vehicle. The drivers facial information, especially the eye status is often believed to give some clues of his/her drowsiness level. The driver eye detection methods based on computer vision use camera to obtain the drivers facial information, extract the parameters that are believed to be related to the drowsiness level of the driver. Many researchers use Percent of Eyelid Closure (PERCLOS) as an indicator to detect drowsiness [1-4]. Ishii et al. build a system using drivers facial
Figure1. Driving Simulator and CCD Camera for
Capturing Drivers Facial Image To adapt to the variances in eye shape and size of different individuals, the algorithm for eye positioning is composed of learning and non-learning mode. The system starts from the learning mode, in which face detection is firstly performed to narrow the search region. Then contour detection and heuristic rules are used to identify the Region Of Interest (ROI) of the eye. By learning the eye region, a set of images that satisfy the pre-set rules are obtained to form the eye templates. When the number of successful learning exceeds a preset threshold, the learning mode swatches to the nonlearning mode, in which the eye templates obtained from the previous stage are used to get the eye position. If the eye position does not meet certain rules,
the method will swatch back to the learning mode to re-learn the eye region. Figure 2 shows the diagram of the eye detection process.
2. Eye Positioning
2.1 Learning Mode
2.1.1 Face Detection. In order to reduce the noise and increase the calculation speed, face detection is first used to narrow the search region.
Learning Mode=Y Capture Frame Library Knowledge N Learning Mode? Sample Learning Library Y
diff ( x, y ) = image( x, y ) acc( x, y ) (2) Binarization is used to enhance the movement contour of the face, as shown in Figure 3 (d). An adaptive threshold is used as shown in Equation 3. max_ value, src( x, y ) > T ( x, y ) dst ( x, y ) = (3) 0, others Where T ( x, y ) is the threshold calculated for every single pixel, as shown in Equation (4). T ( x, y ) = Mean(k , ( x, y )) param (4) Where Mean(k ,( x, y )) is the average pixel value of
the k k neighborhood with the center ( x, y ) . k is odd number, param is a pre-set constant. After the binarization, the center position of the face contour could be obtained by calculating the gravity center of the image, using Equation (5) N 1 M 1 N 1 M 1 q x 0 = x i f ( x, y ) f ( x, y ) x=0 y=0 x=0 y=0 (5) N 1 M 1 N 1 M 1 q = y i f ( x, y ) f ( x, y ) y0 x=0 y =0 x=0 y=0 When the effective pixel value in the square with the center exceeds the pre-set threshold, the square area is regarded as the place where the face is in, as shown in Figure 3 (e). Figure 3 (f) is the calculated face position in the original image.
Face Detection
Division of Eye Search Region Selection of Eye Candidates Heuristic Rules for Filtering N
Eye Detection Heuristic Rules for Filtering N Comply with the Rules? Y Eye Status Recognition N Finish Learning Mode? Y Learning Mode=N
Eye Y Eye Status Recognition
Searching End? Y
Figure 2. Diagram of the eye detection
In a driving situation, the head position of the driver always has slight movement relative to the image background. This could be used to extract the contour of the face movement of the driver by accumulating the image background. Figure 3 illustrates the process of face detection. Figure 3 (a) is the current image. Figure 3 (b) is the accumulated background image. The accumulated background is obtained by using different weights for the previous images, as shown in equation (1). acc( x, y ) = (1 )i acc( x, y ) + iimage( x, y ) (1)
Figure 3. Demonstration of the face detection
Where acc( x, y ) is the accumulated background image, image( x, y ) is the current image, is used to adjust the update frequency of the accumulation. Figure 3 (c) is the subtraction of figure 3 (a) and 3 (b), as indicated in equation (2).
2.1.2 Division of Eye Search Region. When the face position is obtained, we need to initiate the region for eye searching. The vertical range is 3/5 upper area of the face position, the horizontal range is half-to-half of the face position. The initial search region is shown in Figure 4, where Figure 4 (a) is the setting result of the first image, Figure 4 (b) is the setting result of the 21st image. If the eye position has been identified in the previous image, the search region is set with the
previous eye position (the red dot) as the center, as shown in Figure 4 (c).
2.1.3 Selection of Eye Candidates. Again we need binarization to get the valid Region Of Interest (ROI) which contains the eye. Here the same adaptive method described in Equation (3) is used to adapt the different lighting conditions and background. The original and binarized images are shown in Figure 5 (a) and (b) respectively. Some noises still exist in the image after the binarization. Median filter is used to reduce these noises and smooth the image. The result is shown in Figure 5 (c).
position in the current image and the position in the previous image. If the eye poison is stable within a certain period of time, we consider the position as correct, and thus set the image as on of the learning templates.
Figure 7. Examples of Eye Template Obtained
in the Learning Mode
After the eye positioning, the normalization is used to the image to accumulate to the eye template. The accumulation method is the same as the background image discussed above, as indicated in Equation (1).
2.2 Non-learning Mode

Figure 4. Division of Eye Search Region
After learned 300 frames successfully, the arithmetic was set in non-learning mode. In the nonlearning mode, template matching is used for all the eye candidates to the eye template in the sample learning library. Before the matching, the eye region needs to be normalized to the same size with the eye template. The matching is conducted using the Equation (6).
Figure 5. Image Binarization and Smoothing
R ( x, y ) =
x , y
[T ( x, y) I ( x + x, y + y )]
2 x , y x , y
T ( x, y) i I ( x + x, y + y)
(6)
After the pre-process, all the pixels are scanned. By identifying all the connected domains in the image, all the contours are obtained. Figure 6 (a) is the image after contour extraction. From Figure 6 (a), the image has many connected domain after the adaptive binarization. So further filter need to be used to reduce the number of the contours. Here heuristic rules of eye size and length width ratio are used to filter the eye candidates. Figure 6 (b) shows the eye candidates after applying the heuristic rules. Figure 6 (c) is the result.
Figure 6. Contour Extraction and Heuristic
Rule Filtering 2.1.4 Obtain the Eye Template. To ensure the accuracy of the eye positioning, in the learning mode we need to calculate the distance between the eye
Where I is the image, T is the template, R is the result of the matching degree. Besides the template matching, we also considered the distance of the corresponding eyes between the neighboring images. The final matching coefficient R is calculated using the Equation (7). Where R is the template matching coefficient, d is the distance of corresponding eye between the previous image and the current image. is the weight which is pre-set as a constant value. In there =0.3. R =R + d (7) Under the non-learning mode, the eye matching coefficient R is calculated separately for the both eyes. And the smaller value is used as the eye region. If no eye candidate exist in the detecting region, or all the candidates have failure (R is larger than the threshold), or the eye position is unstable according to the position in the previous image (d is larger than the threshold), we consider it as a failure. At this time, the method will swatch back to the learning mode. If no eye candidate exist in the detecting region, or all the candidates have failure (R is larger than the
threshold), or the eye position is unstable according to the position in the previous image (d is larger than the threshold), we consider it as a failure. At this time, the method will swatch back to the learning mode.
3. Eye Status Inference

In order to access the drivers drowsiness level, we mainly focused on the closing level of the eyes.
3.1 Extraction of Eye Skeleton

After the eye positioning, a binarization image of the eye contour is obtained, as shown in Figure 9 (b). We use the distance between the upper and lower eyelid to indicate the eye closing level. In the test, we found using directly the upper edge of the contour as the upper boundary might introduce big noise. So we firstly use a refining algorithm to the binarization image, and after the trimming, the eye skeleton is used as the boundary of the upper eyelid. 3.1.1 Refinement Algorithm. The refinement algorithm could be described as below: Assuming p is the pixel to be detected, f ( p ) is the grey value of pixel p , for the binarization image,
Where xi ( p ) is x ( p ) of the ith neighboring pixel. The result of the refinement is shown in Figure 8 (c). 3.1.2 Trimming. From the result of the refinement, it could be observed that some unnecessary branches were introduced because of the edge jitter. Here we use the trimming method to reduce these branches. First we calculate the length of every branch and keep the one with the maximum length, and delete the others. Figure 9 (d) shows the result after the trimming.
1 f ( n2i 1 ) N and f ( n2i ) I R bi = or f ( n2i +1 ) I R (15) 0 others f ( ni ) R or xi ( p ) = 1, ( i = 3,5 ) (16)
3.2 Calculation of Eye Closing Level

After obtain the eye skeleton, we choose the left and right edge points as the two eye corner points, and the line connecting the two points as the lower eyelid. Then the distance between the upper point on the skeleton and the lower eyelid is calculated. The maximum value is considered as the distance between the upper and lower eyelid, as shown in Figure 8.
Assuming set I = {1} is the sub-set to be refined, set

N = { g g m 0} is the sub-set of the background pixels, set R = {m} is the pixel discarded in the process of refinement. The conditions of image refinement are: f ( p) I Here, ai is defined in Equation (10) (8)
Figure 8. The Extraction of Eye Status
ni ( i = 1, 2, ,8 )
is
the
8-neighborhood
pixel.
4. Validation
To validate our method, we got the test images from the test in which the subjects were driving in a driving simulator introduced above. The size of the image is 360288 pixel, 25 images were captured for every second. Here a series of 13000 images from one test were used for the validation. In these images, 189 images were selected by the algorithm in the learning mode. We select 1 image from every 50 images as the sample. In total, we chose 245 valid non-learning images, 5 invalid images, and 10 learning images. The invalid images which are defined as the unstable images which were discarded in the learning mode account for 2.0% of the total sample. And the learning images account for 3.2% of the total. In the 189 valid learning images, if we contain both the left and right eyes for each image, a total of 359 successful eye positioning was achieved. And the percentage for successful detection is 95.0%. The average processing time is 45ms per frame by a
U ( p) 1 , Where U ( p ) = a1 + a3 + a5 + a7 (9)
1 ai = 0
f ( ni ) N others
8 i =1
(10)
V ( p) 2 Where V ( p) = (1 ai )
(11) (12)
W ( p ) 1 Where W ( p ) = ci
i =1
Here, ci is defined in Equation (13)

1 ci = 0
f ( ni ) I others
4 i =1
(13) (14)
x ( p ) = 1 Where x ( p ) = bi Here, bi is defined in Equation (15)
Pentium 4 3.0GHz PC. Figure 9 shows the images in the learning mode.
once the image quality turns back to normal, the detection could achieve to a high level of accuracy after a new learning process. The future work includes improving the algorithm for higher robustness and fault tolerance, and to enhance the recognition and extraction of the facial characteristics in order to get the face direction detection and understand the facial expression, especially when associated with drowsiness.
Acknowledgements
The authors would like to thank the National High Technology Research and Development Program of China (2006AA11Z213) and the Post-doctor Science Funding (20070420361) for funding this project.
Figure 9. Images in the Learning Mode
In the 240 valid non-learning images, if we also count both eyes for each image, a total of 477 successful eye positioning was achieved, and the success percentage is 99.3%. Figure 10 shows the images in the non-learning mode. The average processing time is 31 ms per frame by the same PC.
References
[1] B. Roman, S. Pavel, P. Miroslav, V. Petr, and P. Lubomr. Fatigue Indicators of Drowsy Drivers Based on Analysis of Physiological Signals. Medical Data Analysis, pp. 62-68, 2001. [2] Z. Jianhui, W. Liyan, F. Li, and L. Yanlei. An Effective Approach for the Noise Removal of Mud Pulse Telemetry System. 8th International Conference on Electronic Measurement and Instruments, ICEMI, pp. 1971-1-974, 2007. [3] R. Grace, V. E. Byrne, and D. M. Bierman. A Drowsy Driver Detection System for Heavy Vehicles. Digital Avionics Systems Conference. Proceedings., 17th DASC. The AIAA/IEEE/SAE, pp. I36/1-I36/8, vol.2, 1998. [4] R. Grace, A. Guzman, and J. Staszewski. The Carnegie Mellon TruckSim: A Tool to Improve Driving Safety. Digital Avionics Systems Conference, Proceedings., 17th DASC, vol.2, pp. I35/1-I35/8 1998. [5] T. Ishii, M. Hirose, and H. Iwata. Automatic Recognition of Driver's Facial Expression by Image Analysis. J. Soc. Automotive Eng., vol. 41, pp. 1398-1403, 1987. [6] H. Ueno, M. Kaneda, and M. Tsukino. Development of drowsiness detection system. Vehicle Navigation Information Systems Conf., Yokohama, Japan, pp. 15-20, 1994. [7] T. D'Orazio, M. Leo, C. Guaragnella, and A. Distante. A visual approach for driver inattention detection. Pattern Recognition, vol. 40, pp. 2341-2355, 2007. [8] Z. Zutao, and Z. Jiashu. A New Real-Time Eye Tracking for Driver Fatigue Detection. 6th International Conference on ITS Telecommunications, pp. 8-11, 2006. [9] W. Qiong, Y. Jingyu, R. Mingwu, and A. Y. Z. Yujie Zheng. Driver Fatigue Detection: A Survey. Intelligent Control and Automation. WCICA 2006. The Sixth World Congress on, pp. 8587-8591, 2006.
(a)Frame No.450
(b)Frame No.4500
(c)Frame No.4750
(d)Frame No.5200
(e)Frame No.7750
(f)Frame No.8100
Figure 10. Images in the Non-learning Mode
4. Results and Future Work

The validation results show that by introducing the learning mode and the adaptive algorithm, the method achieved an accuracy of higher than 95% (99% for the non-learning mode) successful detection from the test images. Currently the algorithm is still a little slow in the learning mode, but the algorithm could already achieve real-time requirements once switched to the nonlearning mode. In the above test, the driving simulator is placed in an indoor environment with unchanged lighting conditions. And for the less satisfactory conditions, such as complicated lighting conditions, great changes of facial expression or movement, failure of eye positioning might occur. In these situations,

DICTA2008 Ruijia (Fred) Feng

Transféré par

Informations du document

Description originale:

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

DICTA2008 Ruijia (Fred) Feng

Transféré par

Droits d'auteur :

Formats disponibles

A Real-Time Adaptive Learning Method for Driver Eye Detection

Figure1. Driving Simulator and CCD Camera for

Eye Y Eye Status Recognition

Figure 2. Diagram of the eye detection

Figure 3. Demonstration of the face detection

Figure 7. Examples of Eye Template Obtained

in the Learning Mode

2.2 Non-learning Mode

Figure 5. Image Binarization and Smoothing

Figure 6. Contour Extraction and Heuristic

3. Eye Status Inference

3.1 Extraction of Eye Skeleton

1 f ( n2i 1 ) N and f ( n2i ) I R bi = or f ( n2i +1 ) I R (15) 0 others f ( ni ) R or xi ( p ) = 1, ( i = 3,5 ) (16)

3.2 Calculation of Eye Closing Level

Assuming set I = {1} is the sub-set to be refined, set

Here, ci is defined in Equation (13)

x ( p ) = 1 Where x ( p ) = bi Here, bi is defined in Equation (15)

Figure 9. Images in the Learning Mode

Figure 10. Images in the Non-learning Mode

4. Results and Future Work

Vous aimerez peut-être aussi