INTRODUCTION and speech processing is a key, enabling technology
(Ohtsuki, Bessho, Matsuo, Matsunaga, & Kayashi, The explosive increase in computing power, network 2006). Progress in this area can impact numerous busi- bandwidth and storage capacity has largely facilitated ness and government applications (Gilbert, Moore, & the production, transmission and storage of multimedia Zweig, 2005). Examples are discovering patterns and data. Compared to alpha-numeric database, non-text generating alarms for intelligence organizations as well media such as audio, image and video are different as for call centers, analyzing customer preferences, and in that they are unstructured by nature, and although searching through vast audio warehouses. containing rich information, they are not quite as expres- sive from the viewpoint of a contemporary computer. As a consequence, an overwhelming amount of data BACKGROUND is created and then left unstructured and inaccessible, boosting the desire for efficient content management With the enormous, ever-increasing amount of audio of these data. This has become a driving force of mul- data (including speech), the challenge now and in the timedia research and development, and has lead to a future becomes the exploration of new methods for new field termed multimedia data mining. While text accessing and mining these data. Due to the non-struc- mining is relatively mature, mining information from tured nature of audio, audio files must be annotated non-text media is still in its infancy, but holds much with structured metadata to facilitate the practice of promise for the future. data mining. Although manually labeled metadata to In general, data mining the process of applying some extent assist in such activities as categorizing analytical approaches to large data sets to discover audio files, they are insufficient on their own when implicit, previously unknown, and potentially useful it comes to more sophisticated applications like data information. This process often involves three steps: mining. Manual transcription is also expensive and data preprocessing, data mining and postprocessing in many cases outright impossible. Consequently, (Tan, Steinbach, & Kumar, 2005). The first step is to automatic metadata generation relying on advanced transform the raw data into a more suitable format for processing technologies is required so that more thor- subsequent data mining. The second step conducts ough annotation and transcription can be provided. the actual mining while the last one is implemented Technologies for this purpose include audio diarization to validate and interpret the mining results. and automatic speech recognition. Audio diarization Data preprocessing is a broad area and is the part aims at annotating audio data through segmentation, in data mining where essential techniques are highly classification and clustering while speech recognition dependent on data types. Different from textual data, is deployed to transcribe speech. In addition to these is which is typically based on a written language, image, event detection, such as, for example, applause detec- video and some audio are inherently non-linguistic. tion in sports recordings. After audio is transformed Speech as a spoken language lies in between and of- into various symbolic streams, data mining techniques ten provides valuable information about the subjects, can be applied to the streams to find patterns and as- topics and concepts of multimedia content (Lee & sociations, and information retrieval techniques can Chen, 2005). The language nature of speech makes be applied for the purposes of indexing, search and information extraction from speech less complicated retrieval. The procedure is analogous to video data yet more precise and accurate than from image and mining and retrieval (Zhu, Wu, Elmagarmid, Feng, & video. This fact motivates content based speech analysis Wu, 2005; Oh, Lee, & Hwang, 2005). for multimedia data mining and retrieval where audio
a27272636 s dndjdjdjd ansjdns sc7727272726 wuqyqqyyqwywyywwy2ywywyw6 4 u ssbsbx d d dbxnxjdjdjdnsjsjsjallospspsksnsnd s sscalop sksnsks scslcoapa ri8887773737372 d djdjwnzks sclalososplsakosskkszmdn d ebwjw2i2737721osjxnx n ksjdjdiwi27273uwzva sclakopsisos scaloopsnx_01_e