Académique Documents
Professionnel Documents
Culture Documents
ABSTRACT
.
Keywords: Video Browsing, Multimedia Abstraction, Fast Data Browsing.
1. INTRODUCTION
In the last years the use of digital information (video, audio, hypertext) has been increasing dramatically. In the ASL
(Advanced Software Lab) we are currently investigating several mechanisms for quick manipulating a large amount of
digital information, specifically video contents.
Anybody is familiar with the concept of leafing through the pages of a magazine for previewing the contents and coarse-
grained search. Mentally the magazine contents are divided in different level of details (or info details): pictures, titles,
subtitles, paragraphs, box text, etc. During the leafing the brain use these different degree of details for approximating and
navigating through the content.
The general concept can be applied on the “leafing” of other media: video, speech and plain text.
In the following we are showing some of the basic general concepts (paragraph 2) applied to speech/text (paragraph 3),
and video content (paragraph 4)
1 step
Coarse Grained Leafing
"The magazine is leafed at fast
speed and only the coarse
contents are extraxted
(pictures and titles)"
A page is
selected ?
2 step
Finer Grained Scanning
"The content around the
picture or tile are scanned
for getting more details"
details ?
3 step
Fine Grained Reading
"The content is read in
details"
Such a basic mechanism can be applied to other kind of contents. We can distinguish the following steps:
- Media Decomposition: the media (video, speech, text) is decomposed in more elementary/coarse grained
information components (images, words, keywords);
- Degree of details (or mental focus): each of these elementary information channels can be abstracted further for
reducing the degree of redundancy; for example text can be structured, in logic parts (titles, subtitles) or semantic
parts (key words);
- Abstract representation: the information often can be represented using simpler abstractions or alternative medium;
for example music, speech and silence [1][2] inside a movie can be mapped to different colors in a time bar or the
speech can be represented by text.
LOW HIGH
DEGREE OF DETAIL
ABSTRACTION
...
HIGH LOW
SPEED OF PRESENTATION
Media decomposition can have different forms depending the nature of the contents: more contents are sophisticated
higher is the possibility to decompose them.
Degree of details can be reached with different techniques too; for example the video information can be shown at
different level of abstraction using different techniques. In Fig 3 we propose few examples of possible picture abstraction,
from the simplest ones (B/W, scaling) to the more complex (MPEG4).
Figure 5 – Some examples of different degree of details for picture representation.
Speech and text can be seen as two alternative information channels. While speech has more a temporal/auditory
dimension, a text page can be browsed spatially and visually.
A traditionally speech recording has very limited alternative uses than “being listen to”. One possibility might be to
navigate through the phonemes of the speech sample; of course this method has some limitation due to the nature of playing
quickly audio samples. Alternatively speech can be mapped to text using extraction algorithms or using a digital format that
transports both audio and text (metadata). Text can further abstracted using key words; and so on.
This simple mechanism can be applied for fast speech browsing, navigation, especially for speech recording of audio
books, educational material, etc. In Figure 3 you can see a basic text bar that can be used for fast positioning in a speech
audio sample. In a very simple small radio player, the key words might be shown in a one-line LCD screen and the user can
advance pressing a fast forward button.
10 20
sec sec
Figure 3 – An example of text based navigation bar for speech on a small device: the keywords are scrolled in the one line LCD for fast
positioning and search.
Another example of text navigation can be done for a normal text page, for example a web page (see Fig.4). This
technique might be useful for e-books: the e-book text can have embedded some “key words” metadata or structure and the
user could operate it in different mode, one for each degree of details: keywords, only picture and titles, etc
Figure 5 – A text can be reduce to different level of details (keywords) for increasing the navigation process.
This technique has very interesting application for increasing the “navigation/browsing” through a large number of
pages. For example imagine to browse through 10 web pages: if it were possible to set the level of granularity or details you
could have a fast overview of the pages with keywords (see figure 5) and occasionally when interested you could “zoom” in
a more detailed level. Fast Forwarding and rewind in Web browser in already available as feature [3], but it is a fast
navigation on full web pages considered more important following an heuristic model.
This simple example shows how the mechanism of decomposing media contents in elementary information channels
works. This information can be further abstracted and presented to the user depending on the speed of browsing.
NY Tribune
I catched the Spider
Save the baby !!
Tokio New York
Bomb
Peter arrest
5. CONCLUSION
Some basic ideas about new ways of consuming multimedia content were presented. A basic concept for finding a trade-off
between speed of presentation and degree of details was explained: “leafing multimedia contents”. In details we described
the speech/text fast browsing and the video “leafing” concept.
ACKNOWLEDGMENTS
A big acknowledgement is aided to all the members of Sony Corporate Lab Europe (SCLE) for their continuous feedback
relatively to the different areas of engineering and scientific fields.
REFERENCES
1. C.G.M. Snoek, M. Worring, MULTIMODAL VIDEO INDEXING: A REVIEW OF THE STATE-OF-ART, Department
of Computer Science (University of Amsterdam), 2003.
2. Ying Li, Wei Ming and Jay Kuo, SEMANTIC VIDEO CONTENT ABSTRACTION BASED ON MULTIPLE CUES,
(Department of Electrical Engineer) University of Southern California, 2001
3. Opera Browser 7.10 – Fast forward and Rewind
4. S. Pfeiffer, R. Lienhart, S. Fisher, and W. Effelsberg, ABSTRACTING DIGITAL MOVIES AUTOMATICALLY,
University of Mannheim, 1996
5. S. X. Ju, Michael J. Black, S. Minneman, and D. Kimber, SUMMARIZATION OF VIDEOTAPED
PRESENTATIONS: AUTOMATIC ANALYSIS OF MOTION AND GESTURE, IEEE trans. On circuits and systems for
video technology (vol. 8, no. 5), 1998
6. A. Barletta, B. Moser, M. Mayer, TECHNOLOGY INVESTIGATION REPORT (Draft – Internal Use), Advanced
Software Lab (Sony Corporate Lab Stuttgart), 2003
7. A. Barletta, B. Moser, M. Mayer, Presentation: VIDEO LEAFING (Draft – Internal Use), Advanced Software Lab
(Sony Corporate Lab Stuttgart), 2003