Vous êtes sur la page 1sur 2

THPM 17.

8
A VIDEO INDEXING SYSTEM USING CHARACTER RECOGNITION
Eun Y Kim*, Kwang In Kim, Keechul Jung, and Hang Joon Kim i Dept. of computer engineering, Kyungpook National Univ. Taegoo, South Korea

ABSTRACT We design and implement a video indexing system. For automatic indexing of video images, character recognition information is utilized. The proposed system consists of three components: keyframe extraction, text extraction, and text recognition. It extracts text regions using a neural network that operates as a set of texture discrimination filters, and performing profile analysis with some heuristics. To ensure accurate segmentation of touching characters, character segmentation and recognition are performed in turn. The proposed system has tested with 390 Korean news archives.

SYSTEM ARCHITECTURE The proposed system consists of three components: keyframe extraction, text extraction, and character recognition. And then, text regions are extracted, the regoins are recognized. Keyframe extraction Keyframes are still images, which represent the content of the slot in an abstract manner. In the proposed system, a simple keyframe extraction technique is used. An MxN frame is preprocessed by considering M/16xN/16 pixels, to obtain 20x15 sub-matrix. We calculate the pixel-bypixel dfference of the reduced image of two successive frames. If the difference is above the predefined threshold, the frame is determined as a keyframe. Text extraction In the proposed system, a neural network is used to classify the pixels of news video frames into text pixels and non-text pixels. The neural network has 81 input nodes, 50 hidden nodes, and 2 output nodes as shown in Figure 1.
Output Layer

INTRODUCTION Visual database systems require efficient indexing to facilitate fast access to the images and video sequences in the database [ 1,2]. Recently, several content-based indexing methods have been presented in the literature [2]. Image properties such as color and texture are used to retrieve image, however, if one want to retrieve the data by the subject, these properties are of little use. On the other hand, texts appearing in an image often serve important information describing the content of an image [2,3]. So there are many researches using character recognition for video indexing [l-31. In this paper, we design and implement a video indexing system based on character recognition. The implemented system mainly consists of keyframe extraction, text extraction, and character recognition. Firstly, keyframes are selected using intensity feature. And then, text regions from keyflames are extracted, and the extracted texts are recognized. Texts in video frames have distinctive in texture [1,2]. For extracting text regions, a neural network that operates as a set of texture discrimination filters. In order to ensure accurate segmentation of touching characters, character segmentation and recognition are performed in turn. We have tested the proposed system with 390 Korean news archives.

Hidden Layer

Input Layer

Input Image

Figure 1. Neural network architecture The input layer receives the intensity values of pixels, at predefined positions inside an MxM window over an 358

0-7803-6301-9/00$10.00 0 2000 E E E

Authorized licensed use limited to: Dayananda Sagar College of Engineering. Downloaded on August 03,2010 at 06:55:07 UTC from IEEE Xplore. Restrictions apply.

input frame. We e select the window size with experiment to show the textual information as clearer as possible. The weights are adjusted by training with a back-propagation algorithm in order to minimize the sum of squared error during the training session. As a result of classification, a binary image with black text pixels and white non-text pixels is obtained. After smoothing it with median filter, the image is then projected along the y-axis. A consecutive vertical zone in which the lines projection values are greater than a threshold is defined as a text zone. Each text zone is also projected along the x-axis, and text segments are obtained. A text segment is a consecutive horizontal zone in which the projection values are greater than a threshold. Final text regions are the segments remaining after elimination irrelevant text segments.

Figure 3. Examples of text recognition

Figure 4. Examples of mis-recognition Figure 4 shows examples of mis-recognition. In Figure 4, the left image is the raw data of the text portion, the middle image is the (negative) binarized image, and the right image is printed using the recognized character code. Most of mis-recognition seems to occur due to lowresolution and binarization error.

Text recognition For automatic character recognition, we use a recognitionbased segmentation method. In the segmentation process, several candidate-cutting points are generated by the MLP-based segmenter. The character recognizer receives the candidates, selects two cutting points by testing candidates, and then recognizes the character between them. The detail process is described in [4]. EXPERIMENTAL RESULTS We have tested the proposed system with 390 Korean news archives. The news archives have running times of 2-3 minutes, and include broadcast news from Munhwa Broadcasting Corporation (MBC) and Korean Broadcasting Corporation (KBS). 2000 keyframes with a size of 320x240 were selected using the proposed technique. Despite its simplicity, this approach produced satisfactory results when tested with the proposed application. Figure 2 shows an example of the text extraction process. In Figure 2, a white rectangle represents a text region. The performance criteria of text extraction were the detection rates and false alarm rates. The detection rate was 95.3% and false alarm rate is 3.5%. The text recognizer used in this experiment has a recognition rate of approximately 71%. The examples of character recognition are reported in Figure 3.

CONCLUSION In this paper, we implemented a video indexing system. For content-based video indexing, the automatic text extraction and OCR techniques using a neural network were used. The proposed method located 95.3 percent of the text regions in a set of 2000 test images with a false alarm rate of only 3.5%. And it had a 71% recognition rate. The proposed method has some problems. As the proposed classifier cannot perfectly discriminate text and non-text, the final results included some errors: false dismissals and false alarms, and has a segmentation errors. REFERENCE F. Idris and S. Panchanathan, Review of image and Video Indexing Techniques, Journal of Visual Communication and Image Processing, vol. 8, no. 2, pp. 146-166, 1997. Hae-Kwang IQm, Efficient Automatic Text Location Method and Content-Based Indexing and Structuring of Video Database, Journal of visual communication and image representation, Vol. 7, No. 4, pp.336-344, 1996. E. Y. Ihm, K. C. Jung, K. Y. Jeong, and H. J. Kim, OCR of Image on the WWW, ICEIC98, pp.287290,1998. J. H. Bae, K. Jung, J. W. Kim, and H. J. Kim, Segmentation of Touching Characters Using an MLP, Pattern Recognition Letters, No. 19, pp. 1147-1160. 1996.

Figure 2. Examples of text extraction

359

Authorized licensed use limited to: Dayananda Sagar College of Engineering. Downloaded on August 03,2010 at 06:55:07 UTC from IEEE Xplore. Restrictions apply.

Vous aimerez peut-être aussi