Vous êtes sur la page 1sur 12

Audio-Based Multimedia

Indexing and Retrieval


Framework in MUVIS
System Overview & Applications
by Serkan KIRANYAZ.
Tampere Univ. of Tech.
MUVIS Overview
MBrowser
Query
Browsing
Video
Summarization
Display
DbsEditor
Encoding-Decoding-
Rendering
Database
Creation
FeX- AFeX
Management
AVDatabase
Capturing
Encoding
Recording
AV Database
Creation
Still Images
*.jpg
*.gif
*.bmp
*.pct
*.pcx *.png
*.pgm
*.wmf
*.eps
*.tga
Real Time
Video-Audio
Stored MM
(Video-Audio)
An Image
A Frame
A Video-
Audio Clip
Image and MM files
Appending - Deleting
Appending into Dbs.
Image and MM
files - types
Convertions
Database
Editing
*.jp2
*.yuv
AV
Database
Image
Database
Hybrid
Database
FeX Modules
AFeX
Modules
Fex & AFeX API
Indexing
Retrieval
MUVIS Multimedia
44.1 KHz
32 KHz PCM
RGB 24 MP4 24 KHz G723
YUV 4:2:0 AVI 22.050 KHz G721
MP4 MPEG-4 AAC Stereo 12 & 16 KHz AAC
AVI Any 1..25 fps H263+ MP3 Mono 8 & 11.025 Khz MP3
File Formats FrameSize Frame Rate Codecs File Formats Channel
Sampling
Freq. Codecs
MUVIS Video MUVIS Audio
PGM WMF EPS PCX TGA PCT GIF PCX PNG TIFF BMP JPEG 2K JPEG
MUVIS Images
Audio-Based Multimedia Indexing
and Retrieval Framework for MUVIS
| A global framework implementation in order to
achieve a robust and generic solution for audio-
based multimedia indexing and retrieval,
specifically:
z Generic Support for Audio Codecs
z Generic Support for File Formats
z Generic Support for Audio Capturing & Encoding Parameters
z Generic Support for AFeX Framework Parameters
| The main objective is content-based (speaker,
subject, sounds like..) retrieval of the audio, which
is suitable to human judgment and (aural)
perception.
Audio Indexing Scheme in MUVIS
Silence Music Speech NotClassified
Audio Framing & Classification Conversion
Uncertain Speech Music NotClassified
AFeX Module
. . . . . .
5
7
3
0
10
20
1
2
9
6
15
Audio Indexing
Speech Music NotClassified
KF Feature Vectors
KF Extraction
via MST Clustering
4
AFeX Operation
per frame
3
Audio Framing
in Valid Classes
2
Classification & Segmentation per granule/frame. 1
Audio Stream
2. Audio Framing with
Classification Conversion
M M M M M M M M S S S S S S S S S X X X X X
Music Speech Uncertain
Classification per granule/frame
Final Classification per audio frame
M: Music
S: Speech
X: Silence
Uncertain
Audio Feature Extraction
(AFeX) Framework
| Independent AFeX module(s) integration
capability into MUVIS framework for audio-
based indexing and retrieval.
DBSEditor
MBrowser
AFex_API.h
AFex_Bind()
AFex_Init()
AFex_Extract()
AFex_GetDistance()
AFex_Exit()
AFex_*.DLL
AFex_Bind
AFex_Init
AFex_Extract
AFex_Exit
AFex_GetDistance
Key-Framing via MST Clustering
S p
ee ch L a b
9
8
1
1
1
2
13
14
18
19
4
'a'
p
1
1
11
12
17
'L'
1
3
16 8
21
'b'
8
7
6
9
0
10 1
1
20
'S'
1
2
9
6
2
1
2
1
15
'ch'
5
7
3
1
2
'ee'
21
A Sample AFeX Module Imp.: MFCC
| MFCC (Mel-Frequency Cepstrum Coefficients)
AFeX module provide generic feature vectors
independent from the following parameters:
z Sampling Frequency.
z Number of audio channels (mono/stereo).
z Audio Volume level.
|
.
|

\
|

=

=
) 5 . 0 ( cos log ) / 2 (
1
2 / 1
j
N
i
m P c
P
j
j i

Audio Retrieval in MUVIS
FV(i)
FV(0)
Sub-feature Vectors
of a Database Clip
FV(i)
FV(0)
FV(i)
FV(0)
Sub-feature Vectors
of Query Clip
For each frame, a search is
done to find a matching frame
which gives minimum distance.
Matching Class Types
Feature vectors per
class type
Audio Retrieval in MUVIS (cont.)
| In order to accomplish an audio based query within MUVIS, an audio
clip is chosen in a multimedia database and queried through the
database if the database includes at least one audio feature.
| Let NoS be the number of feature sets for a database and let NoF(s)
is the number of sub-features per feature. Sub-features are obtained
by changing the AFeX module parameters or the audio frame size
during the audio feature extraction process.
( ) ( )
( )
) , ( ) , (
, ) , (
0
) , ( ), , ( min
) , (
) (
f s D f s W QD
f s D f s D
C j if
C j if f s DFV f s QFV SD
f s D
NoS
s
s NoF
f
c
C i
i
i
i
i
C j
C
j
C
i
i
i
i
i
i

=
=

=

=

Conclusions & Remarks


| Audio is important. Sometimes it bears more
semantic and content information than video.
| Henceforth the preliminary results shows the
effectiveness of the audio-based retrieval compared
to visual retrievals (similar or better results).
| Classification and segmentation algorithm has been
recently improved. A new approach based on fuzzy-
regions and semantic-rule-based classification with
intra segment boundary detection has been
developed.