Vous êtes sur la page 1sur 163

Multimedia BSc Exam 2000 SOLUTIONS Setter: ADM CheckerACJ Additional Material: SMIL Language Description Sheet Answer

3 Questions out of 4 1. (a) What is meant by the terms Multimedia and Hypermedia? Distinguish between these two concepts. Multimedia ---- An Application which uses a collection of multiple media sources e.g. text, graphics, images, sound/audio, animation and/or video. Hypermedia --- An application which uses associative relationships among information contained within multiple media data for the purpose of facilitating access to, and manipulation of, the information encapsulated by the data. 2 MARKS ---- BOOKWORK (b) What is meant by the terms static media and dynamic media? Give two examples of each type of media. Static Media does not change over time, e.g. text, graphics Dynamic Media --- Time dependent (Temporal), e.g. Video, sound, animation. 4 MARKS --- BOOKWORK (c) What issues of functionality need to be provided in order to effectively use a wide variety of media in Multimedia applications? Your answer should briefly address how such functionality can facilitated in general Multimedia applications. The following functionality should be provided: Digital Representation of Media --- Many formats for many media Capture: Digitisation of Media --- special Hardware/Software Creation and editing --- assemble media and alter it Storage Requirements --- significant for multimedia Compression --- related to above and below, ie can save on storage but can hinder retrieval Structuring and retrieval methods of media --- simple to advanced DataBase Storage

- Display or Playback methods --- effect of retrieval must view data - Media Synchronisation --- display multimedia as it is intended 9 MARKS --- BOOKWORK (d) Different types of media will require different types of supporting operations to provide adequate levels of functionality. For the examples of static and dynamic media given in your answer to part 1(b) briefly discuss what operations are need to support a wide range of multimedia applications. A selection of the items below is reuired for good marks NOT ALL. Other Solns Possible? Typical Range of operations required for common media Text: Editing Formatting Sorting Indexing Searching Encrypting ABOVE REQUIRE: : Character Manipulation String Manipulation Audio: Audio Editing Synchronisation Conversion/Translation Filtering/ Sound Enhancing Operators Compression Searching Indexing ABOVE REQUIRE: : Sample Manipulation Waveform Manipulation Graphic primitive Editing Shading Mapping Lighting Viewing Rendering Searching Indexing ABOVE REQUIRE: : Primitive Manipulation Structural/Group Manipulation

Graphics:

Image:

Pixel operations Geometric Operations Filtering Conversion Indexing Compression Searching

Animation: Primitive/Group Editing Structural Editing Rendering Synchronistaion Searching Indexing Video: Pixel Operations Frame Operations Editing Synchronisation Conversion Mixing Indexing Searching Video Effects/Filtering 12 MARKS --- UNSEEN

2. (a) Why is file or data compression necessary for Multimedia activities? Multimedia files are very large therefore for storage, file transfer etc. file sizes need to be reduced. Text and other files may also be encoded/compressed for email and other applications. 2 MARKS --- BOOKWORK (b) Briefly explain, clearly identifying the differences between them, how entropy coding and transform coding techniques work for data compression. Illustrate your answer with a simple example of each type. Compression can be categorised in two broad ways: Lossless Compression -- where data is compressed and can be reconstituted (uncompressed) without loss of detail or information. These are referred to as bit-preserving or reversible compression systems also. Lossy Compression -- where the aim is to obtain the best possible fidelity for a given bit-rate or minimizing the bit-rate to achieve a given fidelity measure. Video and audio compression techniques are most suited to this form of compression. Lossless compression frequently involves some form of entropy encoding and are based in information theoretic techniques Lossy compression use source encoding techniques that may involve transform encoding, differential encoding or vector quantisation. ENTROPY METHODS: The entropy of an information source S is defined as: H(S) = SUMI (PI Log2 (1/PI) where PI is the probability that symbol SI in S will occur. Log2 (1/PI) indicates the amount of information contained in SI, i.e., the number of bits needed to code SI. Encoding for the Shannon-Fano Algorithm: A top-down approach

1. Sort symbols according to their frequencies/probabilities, e.g., ABCDE. 2. Recursively divide into two parts, each with approx. same number of counts.

(Huffman algorithm also valid indicated below) A simple transform coding example A Simple Transform Encoding procedure maybe described by the following steps for a 2x2 block of monochrome pixels: 1. Take top left pixel as the base value for the block, pixel A. 2. Calculate three other transformed values by taking the difference between these (respective) pixels and pixel A, i.e. B-A, C-A, D-A. 3. Store the base pixel and the differences as the values of the transform. Given the above we can easily for the forward transform: and the inverse transform is trivial The above transform scheme may be used to compress data by exploiting redundancy in the data: Any Redundancy in the data has been transformed to values, Xi. So We can compress the data by using fewer bits to represent the differences. I.e if we use 8 bits per pixel then the 2x2 block uses 32 bits/ If we keep 8 bits for the base pixel, X0, and assign 4 bits for each difference then we only use 20 bits. Which is better than an average 5 bits/pixel 8 MARKS --- BOOKWORK

(c) (i) Show how you would use Huffman coding to encode the following set of tokens:

BABACACADADABBCBABEBEDDABEEEBB
How is this message transmitted when encoded? The Huffman algorithm is now briefly summarised: 1. Initialization: Put all nodes in an OPEN list, keep it sorted at all times (e.g., ABCDE).

2. Repeat until the OPEN list has only one node left: (a) From OPEN pick two nodes having the lowest frequencies/probabilities, create a parent node of them. (b) Assign the sum of the children's frequencies/probabilities to the parent node and insert it into OPEN. (c) Assign code 0, 1 to the two branches of the tree, and delete the children from OPEN. Symbol A B C D E Total Count 8 10 3 4 5 30 OPEN (1) OPEN (2) OPEN (3) OPEN (4) 18 30 7 12 -

indicate merge node with other node with number in column

Finished Huffman Tr

Symbol A B C D E

Code 01 00 101 100 11

How is this message transmitted when encoded? Send code book and then bit code for each symbol. 7 Marks --- UNSEEN

(ii) How many bits are needed transfer this coded message and what is its Entropy? Symbol A B C D E Count 8 10 3 4 5 Subtotal # of bits 16 20 9 12 10

Total Number bits (excluding code book) = 67 Entropy = 67/30 = 2.2333 4 MARKS --- UNSEEN (iii) What amendments are required to this coding technique if data is generated live or is otherwise not wholly available? Show how you could use this modified scheme by adding the tokens ADADA to the above message.

Adaptive method needed: Basic idea (encoding) Initialize_model(); while ((c = getc (input)) != eof) { encode (c, output); update_model (c); } So encode message as before: A= 01 D = 100 So addd stream: 011000110001

Modify Tree: Symbol A B C D E

Count 11 10 3 6 5

OPEN (1) OPEN (2) OPEN (3) OPEN (4) 21 35 8 14 -

Huffman tree drawn as before but different. 6 Marks --- UNSEEN

3 (a) What are the major factors to be taken into account when considering what storage requirements are necessary for Multimedia Systems? Major factors: Large volume of date Real time delivery Data format Storage Medium Retrieval mechanisms 4 MARKS --- Unseen/applied bookwork (b) What is RAID technology and what advantages does it offer as a medium for the storage and delivery of large data? RAID --- Redundant Array of Inexpensive Disks Offers: Affordable alternative to mass storage High throughput and reliability RAID System: Set of disk drives viewed by user as one or more logical drives Data may be distributed across drives Redundancy added in order to allow for disk failure 4 MARKS --- BOOKWORK (c) Briefly explain the eight levels of RAID functionality . Level 0 Disk Striping --- distributing data across multiple drives Level 1 Disk Mirroring --- Fault tolerancing Level 2 Bit Interleaving and HEC Parity Level 3 - Bit Interleaving with XOR Parity Level 4 Block Interleaving with XOR Parity Level 5 - Block Interleaving with Parity Distribution Level 6 Fault Tolerant System --- Error recovery Level 7 Heterogeneuos System --- Fast access across whole system 8 MARKS --- BOOKWORK (d) A digital video file is 40 Mb in size. The disk subsystem has four drives and the controller is designed to support read and write onto each drive, concurrently. The digital video stored using the disk striping concept. A block size of 8 Kb is used for each I/O operation.

(i) What is the performance improvement in sequentially reading the complete file when compared to a single drive subsystem in terms of the number of operations performed? We have 5120 segments to write to RAID disks. Given 4 disks we have 1280 actual I/Os to perform On 1 drive we clearly have 5120 operations to perform. (ii) What is the percentage performance improvement expressed as the number of physical I/O operations to be executed in on the RAID and single drive systems? The improvement is (5120 1280)/1280*100 = 300%. Obvious given 4 concurrent drives and RAID!! 11 MARKS --- UNSEEN

4 (a) Give a definition of a Multimedia Authoring System. What key features should such a system provide? An Authoring System is a program which has pre-programmed elements for the development of interactive multimedia software titles. Authoring systems vary widely in orientation, capabilities, and learning curve. There is no such thing (at this time) as a completely point-and-click automated authoring system; some knowledge of heuristic thinking and algorithm design is necessary. Authoring is basically just a speeded-up form of programming --- VISUAL PROGRAMMING; you don't need to know the intricacies of a programming language, or worse, an API, but you do need to understand how programs work. 2 MARKS ---- BOOKWORK (b) What Multimedia Authoring paradigms exist? Describe each paradigm briefly. There are various paradigms, including: Scripting Language The Scripting paradigm is the authoring method closest in form to traditional programming. The paradigm is that of a programming language, which specifies (by filename) multimedia elements, sequencing, hotspots, synchronization, etc. A powerful, object-oriented scripting language is usually the centerpiece of such a system; in-program editing of elements (still graphics, video, audio, etc.) tends to be minimal or non-existent. Scripting languages do vary; check out how much the language is object-based or object-oriented. The scripting paradigm tends to be longer in development time (it takes longer to code an individual interaction), but generally more powerful interactivity is possible. Since most Scripting languages are interpreted, instead of compiled, the runtime speed gains over other authoring methods are minimal. The media handling can vary widely; check out your system with your contributing package formats carefully. The Apples HyperTalk for HyperCard, Assymetrixs OpenScript for ToolBook and Lingo scripting language of Macromedia Director are examples of a Multimedia scripting language. Here is an example lingo script to jump to a frame

global gNavSprite on exitFrame

go the frame play sprite gNavSprite end Iconic/Flow Control This tends to be the speediest (in development time) authoring style; it is best suited for rapid prototyping and short-development time projects. Many of these tools are also optimized for developing Computer-Based Training (CBT). The core of the paradigm is the Icon Palette, containing the possible functions/interactions of a program, and the Flow Line, which shows the actual links between the icons. These programs tend to be the slowest runtimes, because each interaction carries with it all of its possible permutations; the higher end packages, such as Authorware or IconAuthor, are extremely powerful and suffer least from runtime speed problems.

Frame The Frame paradigm is similar to the Iconic/Flow Control paradigm in that it usually incorporates an icon palette; however, the links drawn between icons are conceptual and do not always represent the actual flow of the program. This is a very fast development system, but requires a good auto-debugging function, as it is visually un-debuggable. The best of these have bundled compiled-language scripting, such as Quest (whose scripting language is C) or Apple Media Kit. Card/Scripting The Card/Scripting paradigm provides a great deal of power (via the incorporated scripting language) but suffers from the index-card structure. It is excellently suitedfor Hypertext applications, and supremely suited for navigation intensive (a la Cyans MYST game) applications. Such programs are easily extensible via XCMDs andDLLs; they are widely used for shareware applications. The best applications allow all objects (including individual graphic elements) to be scripted; many entertainment applications are prototyped in a card/scripting system prior to compiled-language coding. Cast/Score/Scripting The Cast/Score/Scripting paradigm uses a music score as its primary authoring metaphor; the synchronous elements are shown in various horizontal tracks withsimultaneity shown via the vertical columns. The true power of this metaphor lies in the ability to script the behavior of each of the cast members. The most popularmember of this paradigm is Director, which is used in the creation of many commercial applications. These programs are best suited for animation-intensive orsynchronized media applications; they are easily extensible to handle other functions (such as hypertext) via XOBJs, XCMDs, and DLLs. Macromedia Director uses this .

Hierarchical Object The Hierarchical Object paradigm uses a object metaphor (like OOP) which is visually represented by embedded objects and iconic properties. Although the learning curve is non-trivial, the visual representation of objects can make very complicated constructions possible. Hypermedia Linkage The Hypermedia Linkage paradigm is similar to the Frame paradigm in that it shows conceptual links between elements; however, it lacks the Frame paradigms visual linkage metaphor. Tagging The Tagging paradigm uses tags in text files (for instance, SGML/HTML, SMIL (Synchronised Media Integration Language), VRML, 3DML and WinHelp) to link pages, provide interactivity and integrate multimedia elements. 8 Marks --- BOOKWORK (c) You have been asked to provide a Multimedia presentation that can support media in both English and French. You may assume that you have been given a sequence of 10 images and a single 50 second digitised audio soundtrack in both languages. Each Image should be mapped over consecutive 5 second fragments of the audio. All Images are of the same 500x500 pixel dimension. Describe, giving suitable code fragments, how you would assemble such a presentation using SMIL. Your solution should cover all aspects of the SMIL presentation <smil> <head> <layout> <root-layout height="500" width="500" backgroundcolor="#000000" title="MultiLingual"/> <region id="image1" width="500" height="500" top="0" left="0" background-color="#000000" z-index="1" /> <region id="image2" width="500" height="500" top="0" left="0" background-color="#000000" z-index="1" /> . </layout> </head> <body> <par> <switch> <!-- English only -->

< audio system-language="en" src ="english.au" /> <!-- French only --> <audio system-language="fr" src ="francais.au" /> </switch> <seq> <img src="image1.jpg" region="image1" begin="0.00s" dur="5.00s" /> <img src="image2.jpg" region="image2" begin="5.00s" dur="5.00s" /> . </seq> </par> </body> </smil> 17 Marks ---- UNSEEN

Multimedia BSc Exam Questions Jan 2001 SOLUTIONS

Exam paper format: (b) Time Allowed: 2 Hours (c) Answer 3 Questions out of 4 (d) Each Question Carries 27 Marks

1. (a) What is meant by the terms Multimedia and Hypermedia? Distinguish between these two concepts. Multimedia ---- An Application which uses a collection of multiple media sources e.g. text, graphics, images, sound/audio, animation and/or video. Hypermedia --- An application which uses associative relationships among information contained within multiple media data for the purpose of facilitating access to, and manipulation of, the information encapsulated by the data. 2 MARKS ---- BOOKWORK (b) What is meant by the terms static media and dynamic media? Give two examples of each type of media. Static Media does not change over time, e.g. text, graphics Dynamic Media --- Time dependent (Temporal), e.g. Video, sound, animation. 4 MARKS --- BOOKWORK (c) What are the main facilities that must be provides in a system designed to support the integration of multimedia into a multimedia presentation? The following functionality should be provided: Digital Representation of Media --- Many formats for many media Capture: Digitisation of Media --- special Hardware/Software

Creation and editing --- assemble media and alter it Storage Requirements --- significant for multimedia Compression --- related to above and below, ie can save on storage but can hinder retrieval Structuring and retrieval methods of media --- simple to advanced DataBase Storage Display or Playback methods --- effect of retrieval must view data Media Synchronisation --- display multimedia as it is intended

8 MARKS --- BOOKWORK

(d) Describe giving suitable code fragments how you would effectively combine and start at the same time a video clip and an audio clip in an MHEG application and start a subtiltle text display 19000 milliseconds into the video clip. You may assume that both clips are of the same duration and must start at the same instant.

TheMHEG code listing below is illustrates the solution: {:Application ("SYNC_APP.mh5" 0) :OnStartUp ( // sequence of initialization actions } {:Scene ("main_scene.mh5" 0) :OnStartUp ( // sequence of initialization actions preload (2) // the connection to the source of the video clip is set up preload (3) // the connection to the source of the audio clip is set up ... setCounterTrigger (2 3 190000) // book a time code event at 190000 msec for example ... ) :Items ( // both presentable ingredients and links {:Bitmap 1 // background bitmap NOT IMPORTANT FOR SOLN :InitiallyActive true :CHook 3 // JPEG :OrigContent :ContentRef ("background.jpg") :OrigBoxSize 800 600 :OrigPosition 0 0 }

{:Stream 2 // video clip :InitiallyActive false :CHook 101 // MPEG-1 :OrigContent :ContentRef ("video.mpg") :Multiplex ( {:Audio 3 // audio component of the video clip :ComponentTag 1 // refers to audio elementary stream :InitiallyActive true } ... // possibly more presentable ingredients } {:Link 49 // the video clip crosses a pre defined time code position :EventSource (2) // video clip :EventType CounterTrigger :EventData 3 // booked at startup by setCounterTrigger (2 3 190000) :LinkEffect ( :SetData (5 // text subtitle is set to a new string, that is :NewRefContent ("subtitle.txt")) // "Subtitle text" :SetHighlightStatus (20 true) // hotspot 20 is highlighted ) } ... // more links ) :SceneCS 800 600 // size of the scene's presentation space }

Key Points: Preloading both clips is essential to start streaming: Need to book 190000 msec event for subtitles Content loaded and Video and audio is multiplexed Set a link transition for subtiltle

13 MARKS - UNSEEN

2. (a) Why is file or data compression necessary for Multimedia activities? Multimedia files are very large therefore for storage, file transfer etc. file sizes need to be reduced often as part of the file format. Text and other files may also be encoded/compressed for email and other applications. 3 Marks -- Bookwork (b) Briefly explain how the LZW Transform Operates. What common compression methods utilise this transform. Suppose we want to encode the Oxford Concise English dictionary which contains about 159,000 entries. Why not just transmit each word as an 18 bit number?

Problems:

* * * *

Too many bits, everyone needs a dictionary, only works for English text. Solution: Find a way to build the dictionary adaptively. Original methods due to Ziv and Lempel in 1977 and 1978. Terry Welch improved the scheme in 1984 (called LZW compression). It is used in UNIX compress and GIF compression

The LZW Compression Algorithm can summarised as follows:

w = NIL; while ( read a character k ) { if wk exists in the dictionary w = wk; else add wk to the dictionary; output the code for w; w = k; }

Original LZW used dictionary with 4K entries, first 256 (0-255) are ASCII codes.

10 MARKS BOOKWORK (c) Show how the LZW transform would be used to encode the following 2D array of image data, you should use 2x2 window elements for the characters:

SEE HANDWRITTEN SOLN attached 14 Marks UNSEEN

3 (a) What key features of Quicktime have lead to its adoption and an international multimedia format? QuickTime is the most widely used cross-platform multimedia technology available today. QuickTime developed out of a multimedia extension for Apple's Macintosh(proprietry) System 7 operating system. It is now an international standard for multimedia interchange and is avalailbe for many platforms and as Web browser plug ins. The following main features are: Versatile support for web-based media Sophisticated playback capabilities Easy content authoring and editing QuickTime is an open standard -- it embraces other standards and incorporates them into its environment. It supports almost every major Multimedia file format 4 Marks BOOKWORK (b) Briefly outline the Quicktime Architecture and its key components.

The QuickTime Architecture: QuickTime comprises two managers: the Movie Toolbox and the Image Compression Manager. QuickTime also relies on the Component Manager, as well as a set of predefined components. Figure below shows the relationships of these managers and an application that is playing a movie.

The Movie Toolbox -- Your application gains access to the capabilities of QuickTime by calling functions in the Movie Toolbox. The Movie Toolbox allows you to store, retrieve, and manipulate time-based data that is stored in QuickTime movies. A single movie may contain several types of data. For example, a movie that contains video information might include both video data and the sound data that accompanies the video. The Movie Toolbox also provides functions for editing movies. For example, there are editing functions for shortening a movie by removing portions of the video and sound tracks, and there are functions for extending it with the addition of new data from other QuickTime movies. The Movie Toolbox is described in the chapter "Movie Toolbox" later in this book. That chapter includes code samples that show how to play movies. The Image Compression Manager --

The Image Compression Manager comprises a set of functions that compress and decompress images or sequences of graphic images. The Image Compression Manager provides a device-independent and driver-independent means of compressing and decompressing images and sequences of images. It also contains a simple interface for implementing software and hardware image-compression algorithms. It provides system integration functions for storing compressed images as part of PICT files, and it offers the ability to automatically decompress compressed PICT files on any QuickTime-capable Macintosh computer. In most cases, applications use the Image Compression Manager indirectly, by calling Movie Toolbox functions or by displaying a compressed picture. However, if your application compresses images or makes movies with compressed images, you will call Image Compression Manager functions. The Image Compression Manager is described in the chapter "Image Compression Manager" later in this book. This chapter also includes code samples that show how to compress images or make movies with compressed images. The Component Manager -Applications gain access to components by calling the Component Manager. The Component Manager allows you to define and register types of components and communicate with components using a standard interface. A component is a code resource that is registered by the Component Manager. The component's code can be stored in a systemwide resource or in a resource that is local to a particular application. Once an application has connected to a component, it calls that component directly. If you create your own component class, you define the function-level interface for the component type that you have defined, and all components of that type must support the interface and adhere to those definitions. In this manner, an application can freely choose among components of a given type with absolute confidence that each will work.

QuickTime Components :

movie controller components, which allow applications to play movies using a standard user interface standard image compression dialog components, which allow the user to specify the parameters for a compression operation by supplying a dialog box or a similar mechanism image compressor components, which compress and decompress image data sequence grabber components, which allow applications to preview and record

video and sound data as QuickTime movies video digitizer components, which allow applications to control video digitization by an external device media data-exchange components, which allow applications to move various types of data in and out of a QuickTime movie derived media handler components, which allow QuickTime to support new types of data in QuickTime movies clock components, which provide timing services defined for QuickTime applications preview components, which are used by the Movie Toolbox's standard file preview functions to display and create visual previews for files sequence grabber components, which allow applications to obtain digitized data from sources that are external to a Macintosh computer sequence grabber channel components, which manipulate captured data for a sequence grabber component sequence grabber panel components, which allow sequence grabber components to obtain configuration information from the user for a particular sequence grabber channel component

10 Marks BookWork (c) Quicktime provides many basic built-in visual effect procedures. By using fragments of Java code show how a cross-fade effect between two images can be created. You solution should concentrate only on the Java code specific to producing the Quicktime effect. You may assume that the images are already imported into the application and are referred to as sourceImage and destImage. You should not consider any Graphical Interfacing aspects of the coding.

This code shows how a Cross Fade Transition effect could be built NOT ALL THE INTERFACING STUFF INCLUDED BELOW IS REQUIRED SEE COMMENTS AFTER FOR IMPORTANT PARTS THAT NEED ADDRESSING.

/* * QuickTime for Java Transition Sample Code */ import java.awt.*; import java.awt.event.*; import java.io.*; import import import import import import import quicktime.std.StdQTConstants; quicktime.*; quicktime.qd.*; quicktime.io.*; quicktime.std.image.*; quicktime.std.movies.*; quicktime.util.*;

import import import import import import import import

quicktime.app.QTFactory; quicktime.app.time.*; quicktime.app.image.*; quicktime.app.display.*; quicktime.app.anim.*; quicktime.app.players.*; quicktime.app.spaces.*; quicktime.app.actions.*;

public class TransitionEffect extends Frame implements StdQTConstants, QDConstants { public static void main(String args[]) { try { QTSession.open(); TransitionEffect te = new TransitionEffect("Transition Effect"); te.pack(); te.show(); te.toFront(); } catch (Exception e) { e.printStackTrace(); QTSession.close(); } }

TransitionEffect(String title) throws Exception { super (title); QTCanvas myQTCanvas = new QTCanvas(QTCanvas.kInitialSize, 0.5f, 0.5f); add("Center", myQTCanvas); Dimension d = new Dimension (300, 300); addWindowListener(new WindowAdapter() { public void windowClosing (WindowEvent e) { QTSession.close(); dispose(); } public void windowClosed (WindowEvent e) { System.exit(0); }

}); QDGraphics gw = new QDGraphics (new QDRect(d)); Compositor comp = new Compositor (gw, QDColor.black, 20, 1); ImagePresenter idb = makeImagePresenter (new QTFile (QTFactory.findAbsolutePath ("pics/stars.jpg")), new QDRect(300, 220)); idb.setLocation (0, 0); comp.addMember (idb, 2); ImagePresenter id = makeImagePresenter (new QTFile (QTFactory.findAbsolutePath ("pics/water.pct")), new QDRect(300, 80)); id.setLocation (0, 220); comp.addMember (id, 4); CompositableEffect ce = new CompositableEffect (); ce.setTime(800); // TIME OF EFFECT ce.setSourceImage(sourceImage); ce. setDestinationImage(destImage); ce.setEffect (createSMPTEEffect, kEffectCrossFade, KRandomCrossFadeTransitionType); ce.setDisplayBounds (new QDRect(0, 220, 300, 80)); comp.addMember (ce, 3); Fader fader = new Fader(); QTEffectPresenter efp = fader.makePresenter(); efp.setGraphicsMode (new GraphicsMode (blend, QDColor.gray)); efp.setLocation(80, 80); comp.addMember (efp, 1); comp.addController(new TransitionControl (20, 1, fader.getTransition())); myQTCanvas.setClient (comp, true); comp.getTimer().setRate(1); } private ImagePresenter makeImagePresenter (QTFile file, QDRect size) throws Exception {

GraphicsImporterDrawer if1 = new GraphicsImporterDrawer (file); if1.setDisplayBounds (size); return ImagePresenter.fromGraphicsImporterDrawer (if1); } } .. FULL CODE NOT REQUIRED Important bits Set up an atom container to use an SMPTE effect using CreateSMPTEeffect() as above

Set up a Transition

with the IMPORTANT parameters:

ce.setTime(800); // TIME OF EFFECT ce.setSourceImage(sourceImage); ce. setDestinationImage(destImage); ce.setEffect (createSMPTEEffect, kEffectCrossFade, KRandomCrossFadeTransitionType);

A doTransition() or doAction() method performs transition

13 Marks --- UNSEEN

4. (a) What is MIDI? How is a basic MIDI message structured?

MIDI: a protocol that enables computer, synthesizers, keyboards, and other musical or (even) multimedia devices to communicate with each other. MIDI MESSAGE: MIDI message includes a status byte and up to two data bytes. Status byte The most significant bit of status byte is set to 1. The 4 low-order bits identify which channel it belongs to (four bits produce 16 possible channels). The 3 remaining bits identify the message. The most significant bit of data byte is set to 0.

Marks --- Bookwork

(b) In what ways can MIDI be used effectively in Multimedia Applications, as opposed to strictly musical applications? Many Application: Low Bandwidth/(Low Quality?) Music on Web, Quicktime etc supports Midi musical instrument set Sound Effectts --- Low Bandwidth alternative to audio samples, Sound Set part of GM soundset Control of external devices --- e.g Synchronistaion of Video and Audio (SMPTE), Midi System Exclusive, AUDIO RECORDERS, SAMPLERS Control of synthesis --- envelope control etc MPEG 4 Compression control --- see Part (c) Digital Audio

8 Marks --- Applied Bookwork: Discussion of Information mentioned in Notes/Lectures. (c) How can MIDI be used with modern data compression techniques? Briefly describe how such compression techniques may be implemented?

We have seen the need for compression already in Digital Audio -- Large Data Files Basic Ideas of compression (see next Chapter) used as integral part of audio format -MP3, real audio etc.

Mpeg-4 audio -- actually combines compression synthesis and midi to have a massive impact on compression. Midi, Synthesis encode what note to play and how to play it with a small number of parameters -- Much greater reduction than simply having some encoded bits of audio. Responsibility to create audio delegated to generation side.

MPEG-4 comprises of 6 Structured Audio tools are: SAOL the Structured Audio Orchestra Language SASL the Structured Audio Score Language SASBF the Structured Audio Sample Bank Format a set of MIDI semantics which describes how to control SAOL with MIDI a scheduler which describes how to take the above parts and create sound the AudioBIFS part of BIFS, which lets you make audio soundtracks in MPEG-4 using a variety of tools and effects-processing techniques

MIDI IS the control language for the synthesis part: As well as controlling synthesis with SASL scripts, it can be controlled with MIDI files and scores in MPEG-4. MIDI is today's most commonly used representation for music score data, and many sophisticated authoring tools (such as sequencers) work with MIDI. The MIDI syntax is external to the MPEG-4 Structured Audio standard; only references to the MIDI Manufacturers Association's definition in the standard. But in order to make the MIDI controls work right in the MPEG context, some semantics (what the instructions "mean") have been redefined in MPEG-4. The new semantics are carefully defined as part of the MPEG-4 specification. 13 Marks --- UNSEEN but Basic Ideas mentioned in lectures and earmarked for further reading . Detailed application of Lecture notes material

CM0340

CARDIFF CARDIFF UNIVERSITY EXAMINATION PAPER

SOLUTIONS
Academic Year: Examination Period: Examination Paper Number: Examination Paper Title: Duration: 2001-2002 Autumn 2001 CM0340 Multimedia 2 hours

Do not turn this page over until instructed to do so by the Senior Invigilator. Structure of Examination Paper: There are three pages. There are four questions in total. There are no appendices.

The maximum mark for the examination paper is 100% and the mark obtainable for a question or part of a question is shown in brackets alongside the question.

Students to be provided with: The following items of stationery are to be provided: One answer book. Instructions to Students: Answer THREE questions. The use of translation dictionaries between English or Welsh and a foreign language bearing an appropriate departmental stamp is permitted in this examination.

CM0340

1.

(a) Give a definition of multimedia and a multimedia system. Multimedia is the field concerned with the computer-controlled integration of text, graphics, drawings, still and moving images (Video), animation, audio, and any other media where every type of information can be represented, stored, transmitted and processed digitally. A Multimedia System is a system capable of processing multimedia data and applications.

2 Marks - BOOKWORK (b) What are the key distinctions between multimedia data and more conventional types of media? Multimedia systems deal with the generation, manipulation, storage, presentation, and communication of information in digital form. The data may be in a variety of formats: text, graphics, images, audio, video. A majority of this data is large and the different media may need synchronisation - the data may have temporal relationships as an integral property. Some media is time independent or static or discrete media: normal data, text, single images, graphics are examples. Video, animation and audio are examples of continuous media 4 Marks Bookwork (c) What key issues or problems does a multimedia system have to deal with when handling multimedia data? A Multimedia system has four basic characteristics:

Multimedia systems must be computer controlled. Multimedia systems are integrated. The information they handle must be represented digitally. The interface to the final presentation of media is usually interactive.

CM0340 Multimedia systems may have to render a variety of media at the same instant -- a distinction from normal applications. There is a temporal relationship between many forms of media (e.g. Video and Audio. There 2 are forms of problems here

Sequencing within the media -- playing frames in correct order/time frame in video Synchronisation -- inter-media scheduling (e.g. Video and Audio). Lip synchronisation is clearly important for humans to watch playback of video and audio and even animation and audio. Ever tried watching an out of (lip) sync film for a long time?

The key issues multimedia systems need to deal with here are:

How to represent and store temporal information. How to strictly maintain the temporal relationships on play back/retrieval What process are involved in the above.

Data has to represented digitally so many initial source of data needs to be digitise -translated from analog source to digital representation. The will involve scanning (graphics, still images), sampling (audio/video) although digital cameras now exist for direct scene to digital capture of images and video. The data is large several Mb easily for audio and video -- therefore storage, transfer (bandwidth) and processing overheads are high. Data compression techniques very common.

7 Marks BOOK WORK (d) An analog signal has bandwidth that ranges from 15Hz to 10 KHz. What is the rate of sampler and the bandwidth of bandlimiting filter required if: (i) the signal is to be stored within computer memory. Nyquist Sample Theorem rate says that sampling must be at least twice the highest frequency component of signal or transmission channel Highest frequency is 10 KHz so Sampling rate = 20 KHz or 20,000 sample per second. Bandwidth of bandlimiting filter = 0 10 KHZ

1 Mark 2 Marks

CM0340 (ii) the signal is to be transmitted over a network which has a bandwidth from 200Hz to 3.4 KHz. Channel has lower rate than max in signal so must choose this a limiting high frequency so Sampling rate = 6.8 KHz or 6,800 sample per second. Bandwidth of bandlimiting filter = 0 3.4 KHZ 7 Marks TOTAL: ALL UNSEEN 2 Marks 2 Marks

(e)

Assuming that each signal is sampled at 8bits per sample what is the difference in the quantisation noise and signal to noise ratio expected for the transmission of the signals in (i) and (ii).

Quantisation noise = Vmax/2n-1 SNR = 20 log (Vmaxal/Vmin)

So for (i) Quantisation noise = 78.125 SNR = 20 log (10,000/15) = 56.48 Db 3 Marks And (ii) Quantisation noise = 26.56 SNR = 20 log (3,400/15) = 47.11 Db 4 Marks 7 Marks TOTAL: ALL UNSEEN

2. (a) Why is data compression necessary for Multimedia activities? Audio and Video and Images take up too large memory,disk space or bandwidth uncompressed.

3 Marks BookWork (b) What is the distinction between lossless and lossy compression? What broad types of multimedia data are each most suited to?

CM0340 Lossless Compression -- where data is compressed and can be reconstituted (uncompressed) without loss of detail or information. These are referred to as bit-preserving or reversible compression systems also. Lossy Compression -- where the aim is to obtain the best possible fidelity for a given bit-rate or minimizing the bit-rate to achieve a given fidelity measure. Video and audio compression techniques are most suited to this form of compression. Types sutability Lossless Computer data fles (compression) Graphcs and graphical images lossless (GIF/LZW)

Lossy Audio MP3 Photographic images (JPEG) Video (Mpeg) 5 Marks Bookwork (c) Briefly explain the compression techniques of zero length suppression and run length encoding. Give one example of a real world application of each compression technique. Simple Repetition Suppresion Simplest Suppression of zero's in a --- Zero Length Supression If in a sequence a series on n successive tokens appears we can replace these with a token and a count number of occurences. We usually need to have a special code to denote when the repated token appears

For Example 89400000000000000000000000000000000 we can replace with 894f32 where f is the code for zero.

CM0340

Example: Silence in audio data, Pauses in conversation Bitmaps Blanks in text or program source files Backgrounds in images

Run-length Encoding This encoding method is frequently applied to images (or pixels in a scan line). It is a small compression component used in JPEG compression. In this instance, sequences of image elements (X1,X2, Xn )are mapped to pairs (c1,l1, c2,l2,(cn,ln) where ci represent image intensity or colour and li the length of the ith run of pixels (Not dissimilar to zero length supression above).

For example:

Original Sequence: 111122233333311112222 can be encoded as: (1,4),(2,3),(3,6),(1,4),(2,4) The savings are dependent on the data. In the worst case (Random Noise) encoding is more heavy than original file: 2*integer rather 1* integer if data is represented as integers. Examples Simple audo graphics Images

7 Marks Bookwork

CM0340 (d) Show how you would encode the following token stream using zero length suppression and run length encoding: ABC000AAB00000000DEFAB00000 Total length of token stream = 27 Zero Length Suppresson Code ABCf3AABf8DEFABf5 Number of tokens 17 where f is code for 0 Run Length Encoding A1B1C103A2B108D1E1F1A1B105 Number of tokens 26

(i) What is the compression ratio for each method when applied to the above token stream? Total length of token stream = 27 Zero Length Suppresson Code ABCf3AABf7DEFABf5 Number of tokens 17 where f is code for 0 Run Length Encoding A1B1C103A2B107D1E1F1A1B105 Number of tokens 26 Compresson ratios: Zero length Supresson = 17/27 Run Length Encoding = 26/27 3 Marks each for correct encoding 2 Mark for each ratio 10 Marks Total

CM0340

(ii) Explain why one has a better compression ratio than the other. What properties of the data lead to this result? Data has only one repeated token the 0. So coding is wasted on rapidly changing remainder of data in run length encoding where every token frequency count needs recording. 2 Marks 12 Marks for all of PART (d) ALL WORK UNSEEN

CM0340

3. (a) Briefly outline the basic principles of Inter-Frame Coding in Video Compression. Essentially Each Frame is JPEG encoded

Macroblocks are 16x16 pixel areas on Y plane of original image. A macroblock usually consists of 4 Y blocks, 1 Cr block, and 1 Cb block.

Quantization is by constant value for all DCT coefficients (i.e., no quantization table as in JPEG).

The Macroblock is coded as follows:

Many macroblocks will be exact matches (or close enough). So send address of each block in image -> Addr

CM0340

Sometimes no good match can be found, so send INTRA block -> Type Will want to vary the quantization to fine tune compression, so send quantization value -> Quant Motion vector -> vector Some blocks in macroblock will match well, others match poorly. So send bitmask indicating which blocks are present (Coded Block Pattern, or CBP). Send the blocks (4 Y, 1 Cr, 1 Cb) as in JPEG.

8 Marks BOOKWORK (b) What is the key difference between I-Frames, P-Frames and B-Frames? Why are I-frames inserted into the compressed output stream relatively frequently? I-Frame --- Basic Refernce FRAME for each Group of picture. Essentially a JPEG Compressed image. P-Frame --- Coded forward Difference frame w.r.t last I or P frame B-Frame --- Coded backward Difference frame w.r.t last I or P frame I-frame Needed regularly as differences cannot cope with drift too far from rerence frame. If not present regularly poor image quality results. 6 Marks BOOKWORK

(c) A multimedia presentation must be delivered over a network at a rate of 1.5 Mbits per second. The presentation consists of digitized audio and video. The audio has an average bit rate of 300 Kbits per second. The digitised video is in PAL format is to be compressed using the MPEG-1 standard. Assuming a frame sequence of: IBBPBBPBBPBBI.. and average compression ratios of 10:1 and 20:1 for the I-frame and P-frame what is the compression ratio required for the B-frame to ensure the desired delivery rate? You may assume that for PAL the luminance Signal is sampled at the spatial resolution of 352x288 and that the two chrominance signals are sampled at half this resolution. The refresh rate for PAL is 25Hz. You should also allow 15% overheads for the multiplexing and packetisation of the MPEG-1 video.

10

CM0340

Desired Rate = 1.5 Mbits/Sec Desired video rate = Rate audio rate = 1.5 0.3 = 1.2 Mbits/Sec Physical rate = Video Rate less Headroom = 1.2 / 1.15 = 1.044 Mbits/Sec Each Group has 12 Frame: 1 I, 8 B and 3 P frames So average frame rate = (0.1 + 3*0.05 + 8x)/12 = (0.25 + 8x)/12 Each frame has: 352*288*8 + 2*(176*144*8) bits (uncompressed) = 1,216,512 bits So average Compressed bits per frame (average over 12 frames GoP) = 1216512*(0.25 + 8x)/12

Therefore Bits per second at 25 Frames per Sec rate= 25*1216512*(0.25 + 8x)/12

We require: 25*1216512*(0.25 + 8x)/12 = 1044000 2534400*(0.25 + 8x) = 1044000 (0.25 + 8x) = 0.412 8x = 0.16 x = 0.02 Or the compression ratio is 50:1 for the B-FRAME

13 MARKS UNSEEN

11

CM0340 4. (a) What key features of Quicktime have led to its adoption and acceptance as an international multimedia format?

QuickTime is the most widely used cross-platform multimedia technology available today. QuickTime developed out of a multimedia extension for Apple's Macintosh(proprietry) System 7 operating system. It is now an international standard for multimedia interchange and is avalailbe for many platforms and as Web browser plug ins. The following main features are: Versatile support for web-based media Sophisticated playback capabilities Easy content authoring and editing QuickTime is an open standard -- it embraces other standards and incorporates them into its environment. It supports almost every major Multimedia file format

4 Marks BOOKWORK

(b) Briefly outline the Quicktime Architecture and its key components.

The QuickTime Architecture: QuickTime comprises two managers: the Movie Toolbox and the Image Compression Manager. QuickTime also relies on the Component Manager, as well as a set of predefined components. Figure below shows the relationships of these managers and an application that is playing a movie.

12

CM0340

The Movie Toolbox -- Your application gains access to the capabilities of QuickTime by calling functions in the Movie Toolbox. The Movie Toolbox allows you to store, retrieve, and manipulate time-based data that is stored in QuickTime movies. A single movie may contain several types of data. For example, a movie that contains video information might include both video data and the sound data that accompanies the video. The Movie Toolbox also provides functions for editing movies. For example, there are editing functions for shortening a movie by removing portions of the video and sound tracks, and there are functions for extending it with the addition of new data from other QuickTime movies. The Movie Toolbox is described in the chapter "Movie Toolbox" later in this book. That chapter includes code samples that show how to play movies. The Image Compression Manager -The Image Compression Manager comprises a set of functions that compress and decompress images or sequences of graphic images. The Image Compression Manager provides a device-independent and driver-independent means of compressing and decompressing images and sequences of images. It also contains a simple interface for implementing software and hardware image-compression algorithms. It provides system integration functions for storing compressed images as 13

CM0340 part of PICT files, and it offers the ability to automatically decompress compressed PICT files on any QuickTime-capable Macintosh computer. In most cases, applications use the Image Compression Manager indirectly, by calling Movie Toolbox functions or by displaying a compressed picture. However, if your application compresses images or makes movies with compressed images, you will call Image Compression Manager functions. The Image Compression Manager is described in the chapter "Image Compression Manager" later in this book. This chapter also includes code samples that show how to compress images or make movies with compressed images. The Component Manager -Applications gain access to components by calling the Component Manager. The Component Manager allows you to define and register types of components and communicate with components using a standard interface. A component is a code resource that is registered by the Component Manager. The component's code can be stored in a systemwide resource or in a resource that is local to a particular application. Once an application has connected to a component, it calls that component directly. If you create your own component class, you define the function-level interface for the component type that you have defined, and all components of that type must support the interface and adhere to those definitions. In this manner, an application can freely choose among components of a given type with absolute confidence that each will work.

QuickTime Components :

movie controller components, which allow applications to play movies using a standard user interface standard image compression dialog components, which allow the user to specify the parameters for a compression operation by supplying a dialog box or a similar mechanism image compressor components, which compress and decompress image data sequence grabber components, which allow applications to preview and record video and sound data as QuickTime movies video digitizer components, which allow applications to control video digitization by an external device media data-exchange components, which allow applications to move various types of data in and out of a QuickTime movie derived media handler components, which allow QuickTime to support new types of data in QuickTime movies clock components, which provide timing services defined for QuickTime applications preview components, which are used by the Movie Toolbox's

14

CM0340 standard file preview functions to display and create visual previews for files sequence grabber components, which allow applications to obtain digitized data from sources that are external to a Macintosh computer sequence grabber channel components, which manipulate captured data for a sequence grabber component sequence grabber panel components, which allow sequence grabber components to obtain configuration information from the user for a particular sequence grabber channel component

10 Marks BookWork

(c) JPEG2000 is a new image compression standard. Outline how this new standard might be incproprated into the Quicktime Architecture. Your answer need not consider the details of the actual compression methods used in JPEG2000 , instead it should focus on how given the compression format you could extend Quicktime to support it.

Sketch of ideas required by solution builds on QT Architecture knowledge above JPEG is a still image format Need to add functionality to the following Media Data Structure --- add knowledge of data structure on new format Component manager --- add new component to component manage Image Compression --- add compression and decompression routines to Compression manage

13 MARKS UNSEEN

15

CM0340

CARDIFF UNIVERSITY EXAMINATION PAPER

SOLUTIONS
Academic Year: Examination Period: Examination Paper Number: Examination Paper Title: Duration: 2002-2003 Autumn 2002 CM0340 Multimedia 2 hours

Do not turn this page over until instructed to do so by the Senior Invigilator. Structure of Examination Paper: There are four pages. There are four questions in total. There are no appendices. The maximum mark for the examination paper is 100% and the mark obtainable for a question or part of a question is shown in brackets alongside the question. Students to be provided with: The following items of stationery are to be provided: One answer book. Instructions to Students: Answer THREE questions. The use of translation dictionaries between English or Welsh and a foreign language bearing an appropriate departmental stamp is permitted in this examination.

CM0340 1. (a) What is MIDI?

Definition of MIDI: a protocol that enables computer, synthesizers, keyboards, and other musical device to communicate with each other. 2 Marks Basic Bookwork (b) How is a basic MIDI message structured? Structure of MIDI messages: MIDI message includes a status byte and up to two data bytes. Status byte The most significant bit of status byte is set to 1. The 4 low-order bits identify which channel it belongs to (four bits produce 16 possible channels). The 3 remaining bits identify the message. The most significant bit of data byte is set to 0.

4 Marks Basic Bookwork

CM0340

(c) A piece of music that lasts 3 minutes is to be transmitted over a network. The piece of music has 4 constituent instruments: Drums, Bass, Piano and Trumpet. The music has been recorded at CD quality (44.1 Khz, 16 bit, Stereo) and also as MIDI information, where on average the drums play 180 notes per minute, the Bass 140 notes per minute, the Piano 600 notes per minute and the trumpet 80 notes per minute. (i) Estimate the number of bytes required for the storage of a full performance at CD quality audio and the number of bytes for the Midi performance. You should assume that the general midi set of instruments is available for any performance of the recorded MIDI data. CD AUDIO SIZE: 2 channels * 44,100 samples/sec * 2 bytes (16bits) * 3*60 (3 Mins) = 31,752,000 bytes = 30.3 Mb Midi: 3 bytes per midi message KEY THINGS TO NOTE Need to send 4 program change (messages to set up General MIDI instruments) = 12 bytes (2 marks) Need to send Note ON and Note OFF messages to play each note properly. (4 marks) Then send 3 mins * 3 (midi bytes) * 2 (Note ON and OFF) * (180 + 140 + 600 + 80) = 18,000 bytes = 17.58 Kb. 8 Marks Unseen 2 for CD AUDIO 6 for MIDI (ii) Estimate the time it would take to transmit each performance over a network with 64 kbps. CD AUDIO Time = 31,752,000*8 (bits per second)/(64*1024) = 3,876 seconds = 1.077 Hours MIDI Time = 18,000*8 / (64*1024) = 2.197 seconds 2 Marks Unseen

CM0340 (iii) Briefly comment on the merits and drawbacks of each method of transmission of the performance. Audio: Pro: Exact reproduction of source sounds Con: High bandwidth/long file transfer for high quality audio MIDI: Pro: Very low bandwidth Con: No control of quality playback of Midi sounds. 4 Marks Unseen but extended discussion on lecture material (d) Suppose vocals (where actual lyrics were to be sung) were required to be added to the each performance in (c) above. How might each performance be broadcast over a network? KEY POINT: Vocals cannot utilize MIDI Audio: Need to overdub vocal audio on the background audio track Need some audio editing package and then mix combined tracks for stereo audio. Assuming no change in sample rate or bit size the new mixed track will have exactly the same file size as the previous audio track so transmission is same as in (c). Midi: Midi alone is now no longer sufficient so how to proceed? For best bandwidth keep backing tracks as MIDI and send Vocal track as Audio. To achieve such a mix some specialist music production software will be needed to allow a file to be saved with synchronized Midi and Audio. How to deliver over a network? Need to use a Multimedia standard that supports MIDI and digital audio. Quicktime files support both (as do some Macromedia Director/Flash(?) files) so save mixed MIDI audio file in this format. The size of the file will be significantly increased due to single channel audio. If this is not compressed and assume a mono audio file file size will increase by around 15Mb. SO transmission time will increase drastically. 7 Marks Unseen

CM0340 2. (a) What is meant by the terms frequency and temporal masking of two or more audio signals? Briefly, what is cause of this masking?

Frequency Masking: When an Audio signal consists of multiple frequencies the sensitivity of the ear changes with the relative amplitude of the signals. If the frequencies are close and the amplitude of one is less than the other close frequency then the second frequency may not be heard. The range of closeness for frequency masking (The Critical Bands) depends on the frequencies and relative amplitudes. Temporal Masking: After the ear hears a loud sound it takes a further short while before it can hear a quieter sound. The cause for both types of masking is that within the human ear there are tiny hair cells that are excited by air pressure variations. Different hair cells respond to different ranges of frequencies. Frequency Masking occurs because after excitation by one frequency further excitation by a less strong similar frequency is not possible of the same group of cells. Temporal Masking occurs because the hairs take time to settle after excitation to respond again. 8 Marks BookWork

CM0340

(b) How does MPEG audio compression exploit such phenomena? Give a schematic diagram of the MPEG audio perceptual encoder. MPEG use some perceptual coding concepts: Bandwidth is divided into frequency subbands using a bank of analysis filters critical band filters. Each analysis filter using a scaling factor of subband max amplitudes for psychoacoustic modeling. FFT (DFT) used, Signal to mask ratios used for frequencies below a certain audible threshold.

8 Marks - BookWork

CM0340

(c) The critical bandwidth for average human hearing is a constant 100Hz for frequencies less than 500Hz and increases (approximately) linearly by 100 Hz for each additiional 500Hz. (i) Given a frequency of 300 Hz, what is the next highest (integer) frequency signal that is distinguishable by the human ear assuming the latter signal is of a substantially lower amplitude? Trick is to realize (remember?) definition of critical band: Critical Band: The Width of a masking area (curve) to which no signal may be heard given a first carrier signal of higher amplitude within a given frequency range as defined above. Critical Band is 100 Hz for 300 Hz signal so if a 300 Hz Signal So range of band is 250 350 Hz. So next highest Audible frequency is 351 Hz 4 Marks Unseen (ii) Given a frequency of 5000 Hz, what is the next highest (Integer) frequency signal that is distinguishable by the human ear assuming the latter signal is of a substantially lower amplitude? 5, 000 Hz critical bandwith is 10 * 100 Hz = 1000Hz So range of of band is 4500 5500 Hz So next highest audible frequency is 5501 Hz 7 Marks Unseen

CM0340

3.

(a) What is the main difference between the H.261 and MPEG video compression algorithms?

H 261 has I and P frames. Mpeg introduces additional B frame for backward interpolated prediction of frames. 2 Marks - Bookwork (b) MPEG has a variety of different standards, i.e. MPEG-1, MPEG-2, MPEG-4, MPEG-7 and MPEG-21. Why have such standards evolved? Give an example target application for each variant of the MPEG standard. Different MPEG standard have been developed for developing target domains that need different compression approaches and now formats for integration and interchange of multimedia data. MPEG-1 was targetted at Source Input Format (SIF): Video Originally optimized to work at video resolutions of 352x240 pixels at 30 frames/sec (NTSC based) or 352x288 pixels at 25 frames/sec (PAL based) but other resolutions possible. MPEG-2 addressed issues directly related to digital television broadcasting, MPEG-4: Originally targeted at very low bit-rate communication (4.8 to 64 kb/sec). MPEG-7 targetted at Multimedia Content Description Interface. MPEG-21 targetted at Multimedia Framework: Describing and using Multimedia content in a unifed framework. 8 Marks Bookwork

CM0340

(c) Given the following two frames of an input video show how MPEG would estimate the motion of the macroblock, highlighted in the first image, to the next frame. For ease of computation in your solution: you may assume that all macroblock calculations may be performed over 4x4 windows. You may also restrict your search to 2 pixels in horizontal and vertical direction around the original macroblock. 1 1 1 1 1 1 1 1 1 1 2 3 3 2 1 1 1 1 2 2 2 2 1 1 1 1 2 4 5 2 1 1 1 1 2 5 3 2 1 1 1 1 2 3 3 2 1 1 1 1 1 3 3 2 1 1 1 1 1 3 3 1 1 1 Frame n 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 2 4 4 4 4 2 1 1 2 3 3 4 5 4 1 1 2 3 4 5 4 4 1 1 2 2 3 4 5 4

Frame n+1

Basic Ideas is to search for Macroblock (MB) Within a 2 pixel window and work out Sum of Absolute Difference (SAD) (or Mean Absolute Error (MAE) for each window but this is computationally more expensive) is a minimum. Where SAD is given by: For i = -2 to +2 For j = -2 to +2

SAD(i, j) = # #| C (x + k, y + l ) " R( X + i + k, y + j + l ) |
k= 0 l= 0

N"1 N "1

Here N = 2, (x,y) the position of the original MB, C, and R is the region to compute the SAD. It is sometimes applicable for an alpha mask to be applied to SAD calculation to mask out certain pixels.

SAD(i, j) = # #| C (x + k, y + l ) " R( X + i + k, y + j + l ) | * (!alphaC = 0)


k= 0 l= 0

N"1 N "1

In this case the alpha mask is not required.

CM0340

So Search Area is given by dashed lines and example window SAD is given by bold dot dash line (near top right corner) 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 2 2 2 2 1 1 2 1 4 3 3 2 1 1 2 1 4 3 4 3 1 1 2 1 4 4 5 4 1 1 2 1 4 5 4 5 1 1 2 1 2 4 4 4 For each Window the SAD score is (take top left pixel as window origin) -2 -1 0 +1 +2 -2 -1 0 +1 +2 12 22 11 22 12 2 11 11 12 11 12 11 11 12 11 9 9 10 11 6 3 4 3 11 7 4 5 1

So Motion Vector is (+2, +2).

10

CM0340

(d) Based upon the motion estimation a decision is made on whether INTRA or INTER coding is made. What is the decision based for the coding of the macroblock motion in (c)? To determine INTRA/INTER MODE we do the following calculation:
N "1

#| C (i, j ) |
MBmean =
Nx,Ny
i= 0, j= 0

A=

# | C(i, j ) " MB
i= 0, j = 0

mean

| *(!alphaC (i, j) = 0)

If A < (SAD 2N) INTRA Mode is chosen.

SO for above motion MB = 17/2 = 8.5 A = 18 So 18 is not less that (1 - 4 =) -3 so we choose INTER frame coding. 5 Marks Unseen

11

CM0340

4.

(a) What is the distinction between lossy and lossless data compression? Lossless preserves data undergoing compression, Lossy compression aims to obtain the best possible fidelity for a given bit-rate or minimizing the bit-rate to achieve a given fidelity measure but will not produce a complete facsimile of the original data.

2 Marks Bookwork (b) Briefly describe the four basic types of data redundancy that data compression algorithms can apply to audio, image and video signals. 4 Types of Compression: Temporal -- in 1D data, 1D signals, Audio etc. Spatial -- correlation between neighbouring pixels or data items Spectral -- correlation between colour or luminescence components. This uses the frequency domain to exploit relationships between frequency of change in data. Psycho-visual, psycho-acoustic -- exploit perceptual properties of the human visual system or aural system to compress data..

8 Marks Bookwork (c) Encode the following steam of characters using decimal arithmetic coding compression: MEDIA You may assume that characters occur with probabilities of M = 0.1, E = 0.3, D = 0.3, I = 0.2 and A = 0.1. Sort Data into largest probabilities first and make cumulative probabilities 0 - E - 0.3 - D - 0.6 I 0.8 M - 0.9 A 1.0 There are only 5 Characters so there are 5 segments of width determined by the probability of the related character. The first character to encoded is M which is in the range 0.8 0.9, therefore the range of the final codeword is in the range 0.8 to 0.89999.. Each subsequent character subdivides the range 0.8 0.9 SO after coding M we get

12

CM0340

0.8 - E - 0.83 - D - 0.86 I 0.88 M - 0.89 A 0.9 So to code E we get range 0.8 0.83 SO we subdivide this range 0 - E - 0.809 - D - 0.818 I 0.824 M - 0.827 A 0.83 Next range is for D so we split in the range 0.809 0.818 0.809 - E - 0.8117 - D - 0.8144 I 0.8162 M - 0.8171 A 0.818 Next Character is I so range is from 0.8144 0.8162 so we get 0.8144 - E - 0.81494 - D - 0.81548 I 0.81584 M - 0.81602 A 0.8162 Final Char is A which is in the range 0.81602 0.8162 So the completed codeword is any number in the range 0.81602 <= codeword < 0.8162 12 Marks Unseen (d) Show how your solution to (c) would be decoded. Assume Codeword is 0.8161 Code can readily determine first character is M since it is in the Range 0.8 0.9 By expanding interval we can see that next char must be an E as it is in the range 0.8 0.83 and so on for all other intervals. 5 marks Unseen

13

CM0340 SOLUTIONS

CARDIFF UNIVERSITY EXAMINATION PAPER

SOLUTIONS
Academic Year: Examination Period: Examination Paper Number: Examination Paper Title: Duration: 2003-2004 Autumn 2003 CM0340 Multimedia 2 hours

Do not turn this page over until instructed to do so by the Senior Invigilator. Structure of Examination Paper: There are four pages. There are four questions in total. There are no appendices. The maximum mark for the examination paper is 100% and the mark obtainable for a question or part of a question is shown in brackets alongside the question. Students to be provided with: The following items of stationery are to be provided: One answer book. Instructions to Students: Answer THREE questions. The use of translation dictionaries between English or Welsh and a foreign language bearing an appropriate departmental stamp is permitted in this examination.

CM0340 SOLUTIONS 1. (a) What does Nyquists Sampling Theorem state? In order to effectively sample a waveform the sampling frequency must be at least twice that of the highest frequency present in the signal 2 Marks --- Bookwork (b) What are the implications of Nyquists Sampling Theorem for multimedia data? Sampling frequency affects the quality of the data ---- higher frequency equals better sampling hence representation of the underlying signal (given fixed frequency range of signal) Sampling frequency affects size of digitized data --- higher frequency means more samples therefore more data. 4 (2 marks each above) Marks --- Bookwork

CM0340 SOLUTIONS

(c)

For each of the following media types, graphics, images, audio and video, briefly discuss how Nyquists Sampling Theorem affects the quality of the data and the form in which sampling effects manifest themselves in the actual data. Graphics o Quality: Not an issue with vector graphics o Sampling Artifact: Rendering may lead to Aliasing effect in lines etc Images o Quality: Image size decreases so less detail or sampling artifacts o Sampling Artifact: Aliasing effect in blocky images Audio o Quality: Lack of clarity in high frequencies, telephonic voices at low sampling frequencies o Sampling Artifact: Digital noise present in signal, loss of high frequencies or poor representation of high frequencies give audio aliasing (should be filtered out before sampling) Video o Quality: Video Frame size decreases so less detail or sampling artifacts, motion blur or loss of motion detail o Sampling Artifact: Aliasing effect in frame images, jittery motion tracking etc.

12 (3 Marks per media type) Marks --- Unseen: Extended reasoning on a few parts of course

CM0340 SOLUTIONS

(d) Calculate the uncompressed digital output, i.e. data rate, if a video signal is sampled using the following values: 25 frames per second 160 x 120 pixels True (Full) colour depth True color = 24 bits (3 bytes) per pixel So number of bytes per second is 3*160*120*25 = 144000 bytes or 1.37 Mb 3 Marks --- Unseen: Application of basic knowledge (e) If a suitable CD stereo quality audio signal is included with the video signal in part d what compression ratio would be needed to be able to transmit the signal on a 128 kbps channel?

Stereo audio = 44100*2 (16 bit/s byte)*2 = 176400 bytes per second So uncompressed bytes stream is 144000 + 176400 = 320400 bytes per second 128 kbps is kilo bits per second so compression ratio is (128*1024)/( 320400 *8) = 0.05. 3 Marks --- Unseen: Application of basic knowledge

CM0340 SOLUTIONS

2.

(a)

What characteristics of the human visual system can be exploited for the compression of colour images and video?

The eye is basically sensitive to colour intensity o Each neuron is either a rod or a cone . Rods are not sensitive to colour. o Cones come in 3 types: red, green and blue. o Each responds differently --- Non linearly and not equally for RGB differently to various frequencies of light.

5 Marks --- Bookwork

CM0340 SOLUTIONS

(b)

What is the YIQ color model and why is this an appropriate color model used in conjunction with compression methods such as JPEG and MPEG? o YIQ is origins in colour TV broadcasting o Y (luminance) is the CIE Y primary. o Y = 0.299R + 0.587G + 0.114B o the other two vectors: o I = 0.596R - 0.275G - 0.321B o The YIQ transform: Q = 0.212R - 0.528G + 0.311B

How to exploit to compression: o Eye is most sensitive to Y, next to I, next to Q. o Quantise with more bits for Y than I or Q. 4 (2 for Transform (Matrix or Eqn) and 2 for Compression scheme) Marks -- Bookwork

CM0340 SOLUTIONS

(c)

Given the following YIQ image values:

128 124 130 154

126 123 136 143

127 124 132 132

129 124 132 132

55 56 45 34

66 57 56 36

54 56 58 39

54 56 49 37

44 44 34 35

44 44 34 35

55 55 36 34

55 55 35 34

What are the corresponding chroma subsampled values for a (i) 4:2:2 subsampling scheme (ii) 4:1:1 subsampling scheme (iii) 4:2:0 subsampling scheme Basic Idea required (from notes): Chroma Subsampling o 4:2:2 -> Horizontally subsampled colour signals by a factor of 2. Each pixel is two bytes, e.g., (Cb0, Y0)(Cr0, Y1)(Cb2, Y2)(Cr2, Y3)(Cb4, Y4) ... o 4:1:1 -> Horizontally subsampled by a factor of 4 o 4:2:0 -> Subsampled in both the horizontal and vertical axes by a factor of 2 between pixels

CM0340 SOLUTIONS

(i) 4:2:2 subsampling scheme Take every two horizontal pixels in I Q Space

128 124 130 154 Full YIQ

126 123 136 143

127 124 132 132

129 124 132 132

55 56 45 34

66 57 56 36

54 56 58 39

54 56 49 37

44 44 34 35

44 44 34 35

55 55 36 34

55 55 35 34

128 124 130 154 4:2:2 YIQ

126 123 136 143

127 124 132 132

129 124 132 132 56

55 45 34

54 56 58 39

44 44 34 35

55 55 36 34

CM0340 SOLUTIONS (ii) 4:1:1 subsampling scheme Take every 4 pixels in the horizonatl 128 126 127 129 124 123 124 124 130 136 132 132 154 143 132 132 Full YIQ 55 66 56 57 45 56 34 36 54 56 58 39 54 56 49 37 44 44 44 44 34 34 35 35 55 55 36 34 55 55 35 34

128 126 127 129 124 123 124 124 130 136 132 132 154 143 132 132 4:1:1 YIQ

55 56 45 34

44 44 34 35

(iii) 4:2:0 subsampling scheme AVERAGE in every 2x2 block

128 124 130 154 Full YIQ

1126 1123 1136 132 1143 132

1127 129 1124 124 132 132

55 56 45 34

66 57 56 36

54 56 58 39

54 46 49 37

44 44 34 35

44 44 34 35

55 55 36 34

55 55 35 34

128 126 124 123 130 136 154 143 4:2:0 YIQ

127 124 132 132

129 124 132 132

59 43

55 46

44 35

55 35

15 Marks --- Unseen: Practical Application of Bookwork Knowledge

CM0340 SOLUTIONS

3.

(a)

What is the distinction between lossy and lossless data compression?

Lossless Compression Where data is compressed and can be reconstituted (uncompressed) without loss of detail or information. These are referred to as bit-preserving or reversible compression systems also. Lossy Compression where the aim is to obtain the best possible fidelity for a given bit-rate or minimizing the bit-rate to achieve a given fidelity measure. Video and audio compression techniques are most suited to this form of compression. 2 Marks Bookwork

10

CM0340 SOLUTIONS

(b)

Briefly describe two repetitive suppression algorithms and give one practical use of each algorithm.

1. Simple Repetition Suppresion If in a sequence a series on n successive tokens appears we can replace these with a token and a count number of occurences. We usually need to have a special flag to denote when the repated token appears For Example 89400000000000000000000000000000000 we can replace with 894f32 where fis the flag for zero. Compression savings depend on the content of the data. Applications of this simple compression technique include: o Silence in audio data, Pauses in conversation etc. o Bitmaps o Blanks in text or program source files o Backgrounds in images o other regular image or data tokens 2. Run-length Encoding In this method, sequences of (image) elements X1, X2 Xn are mapped to pairs (C1,l1), (C2,L2) (Cn,Ln) where ci represent image intensity or colour and li the length of the ith run of pixels. For example: Original Sequence: 111122233333311112222 can be encoded as: (1,4),(2,3),(3,6),(1,4),(2,4) The savings are dependent on the data. In the worst case (Random Noise) encoding is more heavy than original file: 2*integer rather 1* integer if data is represented as integers. 11

CM0340 SOLUTIONS

Applications: o This encoding method is frequently applied to images (or pixels in a scan line). o It is a small compression component used in JPEG compression 10 Marks Bookwork

12

CM0340 SOLUTIONS

(c)

Briefly state the LZW compression algorithm and show how you would use it to encode the following stream of characters:

MYMEMYMO
You may assume that single character tokens are coded by their ASCII codes, as per the original LZW algorithm. However, for the purpose of the solution you may simply output the character rather than the ASCI value. The LZW Compression Algorithm can summarised as follows:

w = NIL; while ( read a character k ) { if wk exists in the dictionary w = wk; else add wk to the dictionary; output the code for w; w = k; }
Original LZW used dictionary with 4K entries, first 256 (0-255) are ASCII codes. Encoding of MYMEMYMO: W K Output Index Symbol ___________________________________________________________ nil M Y M E M MY M M Y M E M Y M O M (ASCII) Y M E 256 M 256 257 258 259 260 261 MY YM ME EM MYM MO

So Token Stream is MYME<256>M 12 Marks Unseen: Application of Algorithm 4. (a) What is the basic format of an MHEG application?

13

CM0340 SOLUTIONS An MHEG-5 application is made up of Scenes and objects that are common to all Scenes. A Scene contains a group of objects, called Ingredients, that represent information (graphics, sound, video, etc.) along with localized behavior based on events firing (e.g., the 'Left' button being pushed activating a sound). At most one Scene is active at any one time. Navigation in an application is done by transitioning between Scenes. 2 Marks Bookwork (b) the Briefly describe the MHEG Client-Server interaction and the role that MHEG Engine plays in this process.

Client Server Interaction (4 Marks) Server streams out content requested by MHEG application. Client/Run Time Engine (RTE) embedded in firmware process MHEG, Deals with streaming of sourced data and formatting for presentation

Run Time Engine (RTE) main functions: (6 Marks) RTE is the kernel of the clients architecture. Issues I/O and data access and requests to client Prepares the presentation and handles accessing, decoding, and managing MHEG-5 objects in their internal format. o Interpretation of MHEG objects Actual presentation, which is based on an event loop where events trigger actions. These actions then become requests to the Presentation layer along with other actions that internally affect the engine.

10 Marks Bookwork (c) the Using suitable fragments of MHEG code, illustrate how you would code following presentation in MHEG:

14

CM0340 SOLUTIONS

Scene 1

Scene 2

The above presentation consists of two scenes. Scene 1 plays some Video and is overlayed by some text information and a next button is provided so that the user may elect to jump to Scene 2. Scene 2 plays some video and is overlayed by a visual prompt which when selected displays some further text information. Note that the precise MHEG syntax and object attributes and attribute values is not absolutely required in your solutions. Rather you should concentrate on giving indicative object attributes values. In essence the structure of the code is more important than precise syntax. Basic Idea Need startup of application --- here do it in startup.mheg This fires up scene1.mheg --- only essential MHEG objects lists. Button1 event triggers scence2.mheg. Important Point is button and link transition scene2.mheg --- fires up on button1 event fires up moreinfo.mheg on visual prompt event trigger. Only essential MHEG objects listed. Important point is some graphics/bitmap overlay icon plus hot spot for link moreinfo.mheg --- simply full of text. Not that important transition to is what Q. requires startup.mheg (3 Marks): {:Application ("startup" 0) //:OnStartUp (:TransitionTo(("scene1.mheg" 0))) :Items( {:Link 1 :EventSource 1 :EventType IsRunning :LinkEffect ( :TransitionTo(("scene1.mheg" 0))) } ) }

15

CM0340 SOLUTIONS

scene1.mheg (Object attribute and attribute values just indicative) 5Marks:

{:Scene ( "scene1.mheg" 0 ) :Items ( {:video 0 :InitiallyActive true :OrigContent :ContentRef ( "waterfall.mov" ) :OrigBoxSize 120 120 :OrigPosition 225 175 :ComponentTag 100 :Termination loop } {:Text 1 :OrigContent 'Some Text ' :OrigBoxSize 95 95 :OrigPosition 0 175 :FontAttributes Bold.14 :TextColour black :HJustification centre :VJustification centre :TextWrapping true } {:PushButton 3 :OrigBoxSize 100 60 :OrigPosition 540 280 :ButtonRefColour gray :OrigLabel "back to main" } {:Link 4 :EventSource 2 :EventType IsSelected :LinkEffect ( :TransitionTo( ( "scene2.mhg" 0) ) ) }
} )

16

CM0340 SOLUTIONS

scene2.mheg (Object attribute and attribute values just indicative) 5Marks:

{:Scene ( "scene2.mheg" 0 ) :Items ( {:video 0 :InitiallyActive true :OrigContent :ContentRef ( "painter.mov" ) :OrigBoxSize 120 120 :OrigPosition 225 175 :ComponentTag 100 :Termination loop } {:Bitmap 1 :OrigContent :ContentRef ( "overlay.gif" ) :OrigBoxSize 51 39 // 0 0 :OrigPosition 10 15 :Tiling false } {:Hot Spot 2 :OrigBoxSize 100 60 :OrigPosition 540 280 :ButtonRefColour gray :OrigLabel "back to main" } {:Link 3 :EventSource 2 :EventType IsSelected :LinkEffect ( :TransitionTo( ( "moreinfo.mhg" 0) ) ) }
} )

17

CM0340 SOLUTIONS moreinfo.mheg (2 Marks)

{:Scene ( "moreinfo.mheg" 0 ) :Items ( {:Text 1 :OrigContent 'Some Text ' :OrigBoxSize 95 95 :OrigPosition 0 175 :FontAttributes Bold.14 :TextColour black :HJustification centre :VJustification centre :TextWrapping true }
) } 15 Marks Unseen

18

CM0340 SOLUTIONS

CARDIFF UNIVERSITY EXAMINATION PAPER

SOLUTIONS
Academic Year: Examination Period: Examination Paper Number: Examination Paper Title: Duration: 2005-2006 Spring 2006 CM0340 Multimedia 2 hours

Do not turn this page over until instructed to do so by the Senior Invigilator. Structure of Examination Paper: There are THREE pages. There are FOUR questions in total. There are no appendices. The maximum mark for the examination paper is 81 marks, and the mark obtainable for each part of a question is shown in brackets alongside the question. Full marks can be obtained by correctly answering 3 questions. Students to be provided with: The following items of stationery are to be provided: One answer book. Instructions to Students: Answer THREE questions. The use of translation dictionaries between English or Welsh and a foreign language bearing an appropriate departmental stamp is permitted in this examination.

CM0340 SOLUTIONS 1. (a) Give a definition of a Multimedia System. A Multimedia System is a system capable of processing multimedia data and applications. 2 Marks Bookwork (b) Briefly describe five Multimedia Authoring Paradigms. Any 5 from: Scripting Language -- the Scripting paradigm is the authoring method closest in form to traditional programming. A powerful, object-oriented scripting language is usually the centerpiece of such a system e.g. Lingo. Iconic/Flow Control -- This tends to be the speediest (in development time) authoring style; it is best suited for rapid prototyping and short-development time projects. Style is much like a schematics where icons represent key media objects or collections of objects and also control over the media. Linkage between media indicated by connections that control to flow of the data. Frame -- The Frame paradigm is similar to the Iconic/Flow Control paradigm but distinct groupings of media are via a Frame layout. Card/Scripting --- The Card/Scripting paradigm provides a great deal of power (via the incorporated scripting language) and an index-card style flow structure, similar to a frame Cast/Score/Scripting --- The Cast/Score/Scripting paradigm uses a music score as its primary authoring metaphor; the synchronous elements are shown in various horizontal tracks with simultaneity shown via the vertical columns. Hierarchical Object -- The Hierarchical Object paradigm uses a object metaphor (like OOP) which is visually represented by embedded objects and iconic properties. Although the learning curve is non-trivial, the visual representation of objects can make very complicated constructions possible. Hypermedia Linkage -- The Hypermedia Linkage paradigm is similar to the Frame paradigm in that it shows conceptual links between elements; however, it lacks the Frame paradigm's visual linkage metaphor. Tagging -- The Tagging paradigm uses tags in text files (for instance, SGML/HTML, SMIL (Synchronised Media Integration Language), VRML, 3DML and WinHelp) to link pages, provide interactivity and integrate multimedia elements.

5 Marks Bookwork (1 per paradigm) 2

CM0340 SOLUTIONS

(c) Briefly describe five ways in which content can be formatted and delivered in a Multimedia Authoring System. 1. 2. 3. 4. 5. Scripting (writing) Standard Text --- say what you want with word Graphics (illustrating) A picture is worth a thousand words say what you want with a graphic illustrations Animation (wiggling) Now we approach multimedia --- say what you want with a graphic animation or video Audio (hearing) Sounds can convey alerts, ambience and contents say what you want with a narration Interactivity (interacting) True mulitedia immerse yourself in am interactive presentation, possibly more instructive. Interactive actions can start animations, audio, move to new parts of presentation, control simulations etc.

10 Marks --- Bookwork (2 Marks per point)

CM0340 SOLUTIONS

(d)

What extra information is multimedia good at conveying with respect to conventional media? Specifically: (i) What can spoken text convey that written text cannot? Spoken Text can convey: Emotion more readily Or Feelings more readily Unwritten Sounds to express feelings or emotions example tut tutting, sharp intakes of breath Accents/dialects readily apparent Important when more than speaker is present Perhaps in getting certain messages across E.g. assimilating a radio play is easier when we can distinguish speakers more easily If in stereo position in 3D sound space is discernable Possibly useful in creating a feeling of space or locating people in immersive 3D environments Easier to author/synchronise with Video Since Audio is already time dependent media, aligning media type on a timeline in a multimedia authoring package is usually simple Writen text will need to be animated e.g. rollover credits, subtitles -- not difficult but some additional multimedia editing of raw text is required (ii) When might written text be better than spoken text? Hard to assimilate/remember a lot of spoken word Written text easier to reread (as opposed to replay) Additional information, such as what a person looks like, what he is wearing, general appearance can be possibly more easily conveyed Directions as to what else is happening, e.g a person moving around, waving arms more easily conveyed e.g. Comparison a (radio) screenplay has plenty of stage directions Written text has negligible bandwidth, high quality audio has significant bandwidth requirements 7 Marks Unseen Total (4 Marks for (i) 3 Marks for (ii))

CM0340 SOLUTIONS 2. (a) Briefly explain how the human visual system senses colour. How is colour exploited in the compression of multimedia graphics, images and video?

The eye is basically just a biological camera Eye through lens etc focused light onto the Retina (back of eye) Retina consists of neurons that fire nerve signals proportional to incident light Each neuron is either a rod or a cone. Rods are not sensitive to colour. Cones organized in banks that sense red green and blue

Multimedia Context: Since Rods do not sense colour only sense luminosity intensity Eye is more sensitve to luminence than colour. Also Eye is more sensitive to red and green and blue (this is due to evolution need to see fellow humans where blue is not prevalent in skin hues) So any Multimedia compression techniques should use colour representation that presents colour in a way that models Human Visual system. We can then encode luminence is high bandwith (more bits) than colour as this is much more perceptually relevant. 5 Marks -- Bookwork (b) List three distinct models of colour used in Multimedia. Explain why there are a number of different colour models exploited in multimedia data formats.

Possible models: RGB CIE Chromaticity YIQ Colour Space YUV (YCrCb) CMY/CMYK

Different models reflect need to represent colour in a perceptually relevant model for effective compression. Different models also due to evolution of colour from Video (YIQ,YUV), Display (RGB) and Print (CMYK) media requirements. 9 Marks Bookwork ---- 3 marks 1 per model, 6 marks for explanation of different models

CM0340 SOLUTIONS (c) Compression of colour has been exploited since analog video. How was colour compression achieved in analog video? Compare this colour compression technique to those used in digital video Analog Video colour compression: NTSC Video uses YIQ colour, PAL use YUV colour models In NTSC, human ye is most sensitive to Y, next to I, finally Q., (Similarly for PAL with YUV). SO use different frequency ranges (analog bandwisdth) in the signal to encode the channels using In NTSC, 4 MHz is allocated to Y, 1.5 MHz to I, 0.6 MHz to Q. In PAL Video U and V signals are lowpass filtered to about half the bandwidth of Y. Digital Color Compression: Similar idea but use different subsampling resolutions of each frame for each colour band. Called Chroma Subsampling E.G.

4:4:4 --- No subsampling in colour bands (Usually only if RGB model used) 4:2:2 -- Horizontally subsampled colour signals by a factor of 2. Each pixel is two bytes, e.g., (Cb0, Y0)(Cr0, Y1)(Cb2, Y2)(Cr2, Y3)(Cb4, Y4) ... 4:1:1 -- Horizontally subsampled by a factor of 4 4: :2:0 -- Subsampled in both the horizontal and vertical axes by a factor of 2 between pixels.

13 Marks --- Unseen. 7 Marks for Analog, 6 for Digital. Extended reasoning of notes. Analog compression briefly mentioned, digital more extensively dealt with. 6

CM0340 SOLUTIONS 3. (a) What is the distinction between lossy and lossless data compression? Lossless Compression -- Where data is compressed and can be reconstituted (uncompressed) without loss of detail or information. These are referred to as bit-preserving or reversible compression systems also. Lossy Compression -- Where the aim is to obtain the best possible fidelity for a given bit-rate or minimizing the bit-rate to achieve a given fidelity measure. Video and audio compression techniques are most suited to this form of compression. 2 Marks Bookwork (b) Briefly outline the JPEG compression pipeline and the constituent compression algorithms employed at each stage in the pipeline.

Stages of JPEG: RGB to YIQ --- colour subsampling into Y and subsampled I, Q channels Discrete Cosine Transform --- converiion to frequency domain (no compression yet) Quantisation --- low pass filter of DCT data using tables or uniform quantization

CM0340 SOLUTIONS Zig-Zag Scan -- writes out a vector from each 8x8 block low frequency coefficients first. (Maps 8 x 8 to a 1 x 64 vector) Differential Pulse Code Modification (DPCM) on DC channel --- DC component is large and varies but often close to previous value SO encode the difference from previous 8x8 blocks. Run Length Encoding (RLE) -- applied to the AC component. Each 1x64 vector has lots of zeros in it SO encode as RLE pairs Entropy Coding (Huffman or Arithmetic) -- DC and AC components finally need to be represented by a smaller number of bits so use Huffman or Arithmetic Coding compresson.

12 Marks Bookwork (5 For basic Outline diagram of pipeline, 7 marks for each stage (c) (i) Apply differential pulse code modulation to compress the following stream of integer numbers. 87463456 Simple coding scheme Transmit the difference between successive tokens in the stream. (Can use less bits to encode the difference) So coded stream is 0 +8 -1 -3 +2 -3 +1 +1 +1 3 Marks If only 3 bits are used in the compressed stream encode what problems if any will occur with the above coding? 3 bits cant encode a difference of 8 so there will be so error (overflow) in the coding. 3 Marks TOTAL 6 Marks Unseen (ii) Apply run length encoding to compress following stream of alphabetical tokens: ABBAARNOOGOODEEEHHHHH Comment on the efficiency of RLE encoding on the above token stream. RLE: Simply code repeating tokens as pair of numbers that says number of repeats plus token. 8

CM0340 SOLUTIONS

Coded RLE stream: (A,1), (B,2), (A,2), (R,1), (N,1), (O,2), (G,1), (O,2), (D,1), (E,3), (H,5) 4 Marks Efficiency: Orifinal Stream is 21 tokens (bytes in this case) Compressed stream is 22 tokens/numbers (bytes if 8-but integer coding assumed) SO COMPRESSED STREAM is actually larger by 1 byte! RLE only gives good compression with a high number of repeated elemets 3 marks Total 7 Marks Unseen

CM0340 SOLUTIONS 4. (a) What is MIDI?

Definition of MIDI: a protocol that enables computer, synthesizers, keyboards, and other musical device to communicate with each other. 2 Marks Basic Bookwork (b) How is a basic MIDI message structured? Structure of MIDI messages: MIDI message includes a status byte and up to two data bytes. Status byte The most significant bit of status byte is set to 1. The 4 low-order bits identify which channel it belongs to (four bits produce 16 possible channels). The 3 remaining bits identify the message. The most significant bit of data byte is set to 0.

4 Marks Basic Bookwork (c) What features of MIDI make it suitable for multimedia applications? Briefly justify your answer. What are the drawbacks of MIDI? MIDI is very low bandwidth when compared to audio. Sounds synthesised at client only control data transmitted MIDI can control many performance aspects: what notes played, how they are played expression, dynamics, volume etc., what sound (instrument) makes the noise. MIDI can support polyphonic (many sounds and many instruments) General MIDI defines a standard sound set for consistent instrumentation across all platforms Drawbacks: Control of actual sounds made (audio synthesis) is delegated to client application so fidelity of sounds not guaranteed. General MIDI defines similar instrument set but NOT quality of sound. Cost (Computer resources, price etc.) define how high quality the sound generation is. 8 Marks Extended Reasoning (d) How is MIDI used within the MPEG-4 audio compression standard? MPEG-4 covers the the whole range of digital audio:

10

CM0340 SOLUTIONS from very low bit rate speech to full bandwidth high quality audio built in anti-piracy measures Structured Audio (MIDI PART OF THIS)

Structured Audio Tools MPEG-4 comprises of 6 Structured Audio tools are SAOL the Structured Audio Orchestra Language SASL the Structured Audio Score Language SASBF the Structured Audio Sample Bank Format a set of MIDI semantics which describes how to control SAOL with MIDI a scheduler which describes how to take the above parts and create sound the AudioBIFS part of BIFS, which lets you make audio soundtracks in MPEG-4 using a variety of tools and effects-processing techniques MIDI is the control of the first 5 above: SAOL is the central part of the Structured Audio toolset. It is a software-synthesis language; it was specifically designed it for use in MPEG-4. SAOL is not based on any particular method of synthesis. It is general and flexible enough that any known method of synthesis can be described in SAOL. Examples of FM synthesis, physicalmodeling synthesis, sampling synthesis, granular synthesis, subtractive synthesis, FOF synthesis, and hybrids of all of these in SAOL. MIDI controls how SAOL makes the sounds (like it controls synthesizers) through SASL variant and more ordinary MIDI semantics: SASL is a very simple language that was created for MPEG-4 to control the synthesizers specified by SAOL instruments. SASL is like MIDI in some ways, but doesn't suffer from MIDI's restrictions on temporal resolution or bandwidth. It also has a more sophisticated controller structure than MIDI, you can write controllers to do anything. SASBF is a format for efficiently transmitting banks of sound samples to be used in wavetable, or sampling, synthesis. The format is least partly compatible with the MIDI Downloaded Sounds (DLS) format. MIDI Semantics As well as controlling synthesis with SASL scripts, SAOL can be controlled with MIDI files and scores in MPEG-4. MIDI is today's most commonly used representation for music score data, and many sophisticated authoring tools (such as sequencers) work with MIDI. The MIDI syntax is external to the MPEG-4 Structured Audio standard; only references to the MIDI Manufacturers Association's definition in the standard. But in 11

CM0340 SOLUTIONS order to make the MIDI controls work right in the MPEG context, some semantics (what the instructions "mean") have been redefined in MPEG-4. The new semantics are carefully defined as part of the MPEG-4 specification. Scheduler: The scheduler is the "guts" of the Structured Audio definition. It's a set of carefully defined and somewhat complicated instructions that specify how SAOL is used to create sound when it is driven by MIDI or SASL. 13 (2 marks for each part of MIDI in Structured Audio framework 1 MARK FOR OVERAL FRAMEWORK) Marks Extended Reasoning from bookwork

12

CM0340 SOLUTIONS

13

CM0340 SOLUTIONS CARDIFF UNIVERSITY EXAMINATION PAPER

Academic Year: Examination Period: Examination Paper Number: Examination Paper Title:

2006/2007 Autumn CM0340 SOLUTIONS Multimedia

SOLUTIONS
Duration: 2 hours Do not turn this page over until instructed to do so by the Senior Invigilator. Structure of Examination Paper: There are ?? pages. There are ?? questions in total. There are no appendices. The maximum mark for the examination paper is 80 and the mark obtainable for a question or part of a question is shown in brackets alongside the question. Students to be provided with: The following items of stationery are to be provided: ONE answer book. Instructions to Students: Answer 3 questions. The use of translation dictionaries between English or Welsh and a foreign language bearing an appropriate departmental stamp is permitted in this examination.

PLEASE TURN OVER

CM0340 SOLUTIONS

Q1.

(a) Why is data compression desirable for multimedia activities? Data is high resolution in both temporal and/or spatial domains. Storgage, bandwidth and processing limitations need compressed data. 2 MARKS BOOKWORK (b) Briey outline the four broad classes of approach that one may exploit to compress multimedia data. Do not detail any specic compression algorithms. [1] [1]

Compression basically employs redundancy in the data: Temporal in 1D data, 1D signals, Audio etc. correlation between sequential data points. [2] Spatial correlation between neighbouring pixels or data items in 2D. [2] Spectral This uses the frequency domain to exploit relationships between frequency of change in data. E.g. in video/imagery, correlation between colour or luminescence components. [2] Psycho-visual exploit perceptual properties of the human auditory/visual system. [2] 8 MARKS BOOKWORK Give one example of a compression algorithm for each class. EXAMPLES: Temporal Any Audio/Video compression method, Zero length suppression, pattern substitution, Pulse Code Modulation (a few variants), MPEG Audio, MPEG Video, H.264 [1] Spatial Any Image/Video compression algorithm, GIF, JPEG, MPEG, H.264. [1] Spectral JPEG, MPEG, H.264. [1] Psycho-visual MPEG audio, MPEG Video, JPEG (colour conversion). [1] 4 MARKS BOOKWORK

CM0340 SOLUTIONS

(c) Consider the following sequence of 8-bit integers:

4 6 9 11 13 12 13 14 12 11

Show how you would code this sequence using: i. Differential Pulse Code Modulation (DPCM) Sequence: +4 +2 +3 +2 +2 -1 +1 +1 -2 -1 ii. Differential Pulse Code Modulation with delta modulation Sequence: 1 1 1 1 1 0 1 1 0 0 1 = increase/equality in data, 0 = decrease in data iii. Adaptive Differential Pulse Code Modulation (ADPCM) with window size of 3 Averaged Sequence: 4 5 6 9 11 12 13 13 13 13 12 11 ADPCM Sequence: 4 +1 +1 +3 +2 +1 +1 0 0 0 -1 -1 10 Marks UNSEEN application of knowledge/bookwork Comment on the relative efciency of these methods on the above sequence. Compression relies on coding the differences in less bits In above DPCM needs at least 3bits as does ADPCM, Delat clearly 1 bit [1] Delta Modulation using 1 bit so cant code any real differences [1] ADPCM averages out errors/noise in data [1] 3 Marks UNSEEN application of knowledge/bookwork [3] [3]

[4]

PLEASE TURN OVER

CM0340 SOLUTIONS

Q2.

(a) What is MIDI? Denition: A protocol that enables computers, synthesizers, keyboards, and other musical devices to communicate with each other. [2] 2 Marks bookwork (b) What features of MIDI make it suitable for controlling software or hardware devices? Basic syntax of midi is all about control playing notes, setting up sounds etc. [1] A wide number of specialised control messages some set like sustain, modulation, pitch bend others freely assignable. [1] Wide range of controllers available e.g. built in to keyboards, specialist hardware, software reassignable. [1] Midi System Real-time Messages for control of timing/syncing e.g. SMPTE, Midi Time Code. [1] System exclusive command set makes MIDI completely extensible to control any device. [1] External hardware to convert MIDI to/from other formats e.g. Pitch to Midi converters, Midi to Control Voltage (analogue synths), Motion Capture to MIDI!. [1] 6 Marks unseen. Assimilation of various aspects of bookwork (c) Briey outline how MIDI might be used to control to following: i. Musical disco lights that pulse different lights in time with some MIDI music. Assemble hardware converter that pulses different light banks to ranges of MIDI Notes [2] Possible extend so that light banks respond to different MIDI Channels [1] 3 Marks Total

CM0340 SOLUTIONS

ii. Video DJing where the playing and synchronisation of video and music clips is controlled via MIDI. A variety of possible solutions possible. Simplest solution based on some existing VJ software (www.arkaos.net) Treat problem as live sequencing of Video and music samples/MIDI segments. [1] Map out on keyboard like drum maps/samplers one key per video/music clip. [1] Effects/Transitions/Fades/Wipes controlled by other midi events [1] Assemble sequence of actions/transitions in software app a bit like a sequencer to learn new effects [1] MIDI controllers can be assigned to control parameters of many effects. e.g volume, colour saturation, speed of transitions/effects. [1] 5 Marks Total iii. A motion capture MIDI interface where musical information is input via arm and upper body motion or gestures. Based on GypsyMIDI: MOCAP MIDI converter (http://www.animazoo.com/products/gypsyRmidillo.htm) Motion capture device capture points from body (e.g. wrist, elbow, shoulder. [1] Specialist software converts this point data to MIDI [1] Angles between joints, speed of points, gestures may be learnt [1] Motion events mapped to midi events, congurable (perhaps as in GypsyMidi). [1] Midi events mapped to synthesiser.computer sequencer etc. [1] 5 Marks Total

PLEASE TURN OVER

CM0340 SOLUTIONS

iv. A singing musical sampler/synthesiser where the synthesiser can take in an English phrases and sing these with a MIDI supplied melody. Based on Worbuilder Application (demoed in lecture but not explained) in EAST WEST SYMPHONIC CHOIR Software Sample/Synthesiser (http://www.soundsonline.com/EastWest-Quantum-Leap-Symphonic-Choirspr-EW-165.html) Basic Idea is to map samples of words or parts of words (sung phonemes) to keys on the keyboard, just like drum samples or any triggered samples. (Choir sung phonemes actually a smaller set than spoken or single voice singing.) [1] Organisation is via groups of phonemes and NOT pitch now across keyboard. So direct mapping of keyboard pitch to sample is not possible [2] Solution use and intermediary software, Wordbuilder, that converts the sung phrase and note data to mappings to trigger actual sample [1] Wordbuilder is much like a simple sequencer where timing of triggering phrases can be controlled via user interface. [1] Phrases maybe be English (or Latin) but ideally in phonetic form. Wordbuilder does English to Phoneme conversion but direct phonetic input is possible [1] 6 Marks Total 19 Marks Unseen. Most completely unseen. MOCAP/VJ and Choir Sampler system briey mentioned as an example of MIDI apps in lecture but not how they work. MOCAP involves so lateral thinking of how devices may work. Most others build on mapping of sounds/video to midi events (key presses on a keyboard) similar to percussion mapping which has been dealt with in lectures.

CM0340 SOLUTIONS

Q3.

(a) What are critical bands in relation to the human ears perception of sound? Frequncy masking occurs in human ear where certain frequencies are masked by neighbouring frequencies. Range of closeness for frequency masking depends on the frequencies and relative amplitudes. Each band where frequencies are masked is called the Critical Band [2] 2 Marks Bookwork (b) How does MPEG audio compression achieve critical band approximation? Create Critical Band Filter Banks Fourier Transform Audio Data Bandpass lter data according to know Critical Bandwidths 3 Marks Bookwork [1] [1] [1]

PLEASE TURN OVER

CM0340 SOLUTIONS

(c) List three coding methods in MPEG audio coding that exploit different perceptual characteristics of the human ear when hearing sound. Briey explain how these arise in the human ear and how these methods are implemented in MPEG audio compression. Three Methods Frequency Masking Stereocilia in inner ear get excited as uid pressure waves ow over them. Stereocilia of different length and tightness on Basilar membrane so resonate in sympathy to different frequencies of uid waves (banks of stereocilia at each frequency band). Stereocilia take already excited by a frequency cannot be further excited by a lower amplitude near frequency wave. [2] MPEG Audio Exploits this by quantising each lter bank with adaptive values from neighbouring bands energy. [2] Temporal Masking Stereocilia need time to come to rest (well so do standing waves set up in closed inner ear). Until at rest stereocilia cant be excited. [2] Not so easy to model as frequency masking. MP3 achieves tis with a 50% overlap between successive transform windows gives window sizes of 36 or 12 and applies basic frequency masking as above. [2] Stereo Redundancy at low frequencies, the human auditory system cant detect where the sound is coming from, So don need stereo. [2] Encode to mono and save bits. code low frequency critical bands in mono. [2] 12 Marks Applied Bookwork: Some lateral thinking as topics cover different aspect of MPEG audio compression and Stereo Redundancy not related to Frequency/Temporal Masking

CM0340 SOLUTIONS

(d) Given two stereo channels of audio: Left Channel: Right Channel: 12 12 12 14 13 16 14 4 15 44 27 20 36 2 44 3

i. Apply Middle/Side (MS) stereo coding to the sequence. Basic Idea: Middle sum of left and right channels Side difference of left and right channels. Middle: Side: 24 0 26 -2 29 -3 18 59 47 10 -29 7 38 47 34 41 [6] 6 Marks Applied bookwork (small part in notes) ii. How may this result be employed to achieve compression? Encode side in less bits as its essentially Differential Pulse Code Modulation. [1] Use specially tuned threshold values to compress the side channel signal further. [1] Code Middle in normal (for audio) 16 bits (8 Bits would be OK for this answer) [1] Code Side in reduced number of bits. Needs to be signed so in the above 7 bits needs [1] 4 Marks Bookwork applied to given data

PLEASE TURN OVER

CM0340 SOLUTIONS

Q4.

(a) What are the target media for JPEG compression? Colour and greyscale images [1] Photographic Quality Images (Up 24 bit colour images) Many applications e.g., satellite, medical, general photography... [1] 2 Marks Bookwork (b) What are the main differences between the target media for JPEG and GIF compression? GIF can only deal with 8 colour bit images, JPEG colour 24bit GIF Target Graphical Images JPEG Photographic Quality Images 2 Marks Bookwork [1] [1]

10

CM0340 SOLUTIONS

(c) Compare the basic compression processes for JPEG and MPEG Intraframe coding. Your solution should outline the common basic processes for both, and particularly emphasise on the differences. Which steps in the process cause both JPEG and MPEG to be lossy The Major Steps in JPEG/MPEG Coding involve:

Figure 1: JPEG/MPEG Encoding Colour Space Transform and subsampling DCT (Discrete Cosine Transformation) Quantization Zigzag Scan Discrete Pulse Code Modulation (DPCM) on DC component (in JPEG), Run length encoding (RLE) on AC Components (JPEG), all of zig zag (MPEG). Entropy Coding Huffman or Arithmetic [7] Four main differences for JPEG uses YIQ whilst MPEG use YUV (YCrCb) colour space [1] MPEG used larger block size DCT windows 16 even 32 as opposed to JPEGs 8 [1] Different quantisation MPEG usually uses a constant quantisation value. [1] Only Discrete Pulse Code Modulation (DPCM) on DC component in JPEG on ziz zag scan. AC (JEPG) and complete zig zag scan get RLE. [1] Lossy steps: Colour space subsampling in IQ or UV components. Quantisation reduces bits needed for DCT components. [1] [1]

13 Marks Applied Bookwork: Some lateral thinking to compare JPEG and MPEG not in notes at least

11

PLEASE TURN OVER

CM0340 SOLUTIONS

(d) Given the following portion from an 8x8 block from an image after the Discrete Cosine Transform has been applied: 128 128 32 4 64 32 16 31 46 128 64 160 12 32 40 32

i. What is the result of the quantisation step of the JPEG/MPEG compression method assuming that a constant quantisation value of 32 was used? Trick needed to be remembered from notes is that we divide the matrix by the quantisation table or in this case a constant. So in this case divide all values by 32 and round down. 4 4 1 0 2 1 0 0 1 2 0 1 4 5 1 1 [3] ii. What is the result of the following zig-zag step being applied to the quantised block? Trick needed to be remembered from notes is that Zig-zag reads of values from DCT in an increasing low frequency order (better that row by row). Create a vector rather than a matrix.

So we get a vector from matrix above: 4241114200005111 [3] iii. What is the result of the following run length encoding (RLE) step being applied to the zig-zagsteps output? RLE: Replace repeating values with their value plus number of repetitions in the sequence. Se we get a vector: (4, 1), (2,1),(4, 1), (1,3), (4,1), (2,1), (0,4), (5,1), (1,3) 10 Marks UNSEEN application of bookwork/knowledge 12X END OF EXAMINATION [4]

CM0340SOLNS CARDIFF UNIVERSITY EXAMINATION PAPER

Academic Year: Examination Period:

2007/2008 Autumn

Examination Paper Number: CM0340SOLNS Examination Paper Title: Multimedia

SOLUTIONS
Duration: 2 hours

Do not turn this page over until instructed to do so by the Senior Invigilator. Structure of Examination Paper: There are 13 pages. There are 4 questions in total. There are no appendices. The maximum mark for the examination paper is 80 and the mark obtainable for a question or part of a question is shown in brackets alongside the question. Students to be provided with: The following items of stationery are to be provided: ONE answer book. Instructions to Students: Answer 3 questions. The use of translation dictionaries between English or Welsh and a foreign language bearing an appropriate departmental stamp is permitted in this examination.

PLEASE TURN OVER

CM0340SOLNS

Q1.

(a) Why is data compression desirable for multimedia activities? Data is high resolution in both temporal and/or spatial domains. Storage, bandwidth and processing limitations need compressed data. 2 Marks total BOOKWORK (b) What is the distinction between lossy and lossless data compression? Lossless Compression Where data is compressed and can be reconstituted (uncompressed) without loss of detail or information. These are referred to as bit-preserving or reversible compression systems also. [1] Lossy Compression Where the aim is to obtain the best possible delity for a given bit-rate or minimizing the bit-rate to achieve a given delity measure. Video and audio compression techniques are most suited to this form of compression. [1] 2 Marks total BOOKWORK (c) What are the main differences between the target media for JPEG and GIF compression? JPEG 24 bit full/true colour imagery, typically photographic in nature. GIF 8-bit colour imagery, suits graphics type images better. 2 Marks total BOOKWORK (d) What improvement did the LZW algorithm make on previous LZ versions? LZW introduced the idea that only the initial dictionary needs to be transmitted to enable decoding. [1] The decoder is able to build the rest of the table from the encoded sequence. [1] 2 Marks total BOOKWORK [1] [1] [1] [1]

CM0340SOLNS

(e) Describe the LZW algorithm for encoding an input sequence, giving suitable pseudocode. The LZW Compression Algorithm can summarised as follows: w = NIL; while ( read a { if wk w else { character k ) exists in the dictionary = wk; add wk to the dictionary; output the code for w; w = k;

} } [6] May have a prior dictionary Original LZW used dictionary with 4K entries, rst 256 (0-255) are ASCII codes. GIF builds dictionary of image blocks no priors. [1] 6 Marks for algorithm 1 Mark for dictionary build 7 Marks total BOOKWORK

PLEASE TURN OVER

CM0340SOLNS

(f) Given an initial dictionary: Index 1 2 3 4 5 6 and output of an LZW encoder: 6 3 4 5 1 3 1 6 2 9 11 16 decode the above sequence (which is not intended to represent meaningful English The LZW Decompression Algorithm is as follows: read a character k; output k; w = k; while ( read a character k ) /* k could be a character or a code. */ { entry = dictionary entry for k; output entry; add w + entry[0] to dictionary; w = entry; } Process as follows: Read 6 from input, lookup 6 so output t. Set w = t Read 3 from input, lookup 3 so output h. Add entry 7 to dictionary th. Set w = h Read 4 from input, lookup 4 so output i. Add entry 8 to dictionary hi. Set w = i Read 5 from input, lookup 5 so output s. Add entry 8 to dictionary is. Set w = s Read 1 from input, lookup 1 so output a. Add entry 9 to dictionary sa. Set w = a Read 3 from input, lookup 3 so output h. Add entry 10 to dictionary ah. Set w=h Read 1 from input, lookup 1 so output a. Add entry 11 to dictionary ha. Set w=a Read 6 from input, lookup 6 so output t. Add entry 12 to dictionary at. Set w = t Read 2 from input, lookup 2 so output b. Add entry 13 to dictionary tb. Set w = b Read 9 from input, lookup 9 so output sa. Add entry 14 to dictionary bs. Set w = sa 4 Entry a b h i s t

CM0340SOLNS

Read 11 from input, lookup 11 so output ha. Add entry 15 to dictionary sah. Set w = ha Read 16 from input, lookup 16 so output ha.... We dont know full entry yet but we know it starts with current w = ha.... Add entry 16 to dictionary hah. We now have entry 16 this is the built in exception handler of LZW, so we can output 16 now. Set w = hah Final Decode Table: Index 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Output decode sequence: thisahatbsahahah Entry a b h i s t th hi sa ah ha at tb bs sah hah

6 Marks for correct algorithm and applying it. 4 marks for showing table constructions and the correct decoded sequence 2 Marks for solving the LZW exception handler look ahead last step. 12 Marks total UNSEEN

27 Marks Total Question 1 5 PLEASE TURN OVER

CM0340SOLNS

Q2.

(a) In a digital signal processing system, what is meant by block and sample-bysample processing. Give one example of an application of each type. Block processing : data is transferred into a memory buffer and then processed each time the buffer is lled with new data. [1] E.g. fast Fourier transforms (FFT), Discrete Cosine Transform (DCT), convolution, convolution reverb more soon [1] Sample-by-sample processing : input is processed on individual sample data. [1] E.g. volume control, envelope shaping, modulation (ring/amplitude), IIR/FIR ltering.... [1] 4 Marks Total Bookwork (b) In a digital signal processing system, what is meant by a linear and a non-linear time invariant system. Give one example of an application of each type. Linear time invariant system (LTI) : Systems that do not change behaviour over time and satisfy the superposition theory. The output signal is signal changed in amplitude and phase. [1] E.g. Convolution, Filters [1] Non-linear time invariant system : Systems whose output is strongly shaped by non-linear processing that introduces harmonic distortion i.e. harmonics that are not present in the original signal will be contained in the output. [1] E.g. Limiters, Compressors, Exciters, Distortion, Enhancers. [1] 4 Marks Total Bookwork

CM0340SOLNS

(c) Give the denition of an impulse response. Give two practical uses of an impulse response in digital signal processing. E.g. Unit Impulse: Dened as: (n) = 1 if n = 0 0 otherwise (n = 0)

[2] Two Uses: A test signal for digital systems, characterise system by their response functions. [1] Convolution Reverb sample a rooms impulse response, convolve with input to get a reverberated output. [1] 4 Marks Total Bookwork (d) List the three basic components used in constructing a signal ow graph. 3 Components: Delay Multiplication Summation 3 Marks Total Bookwork Why is it desirable to describe systems using these components. Componentisation of nearly all basic DSP lters, delays into standard processes. [1] Basic building blocks of nearly all basic DSP lters, delays description of algorithms simple. [1] These component make easy cost-effective construction of hardware implementations from standard components. [1] 7 PLEASE TURN OVER [1] [1] [1]

CM0340SOLNS

3 Marks Total Unseen/Assimilation of bookwork/discussed in tutorial 6 Marks Total For Question (d) Subpart (e) What is the main distinction between an innite impulse response (IIR) and a nite impulse response (FIR) lter. IIR lters have a feedback loop in their construction 1 Mark Bookwork (f) Given the following difference equation construct its signal ow diagram: y (n) = b0 x(n) + b1 x(n 1) + b2 x(n 2) a1 y (n 1) a2 y (n 2)
y (n) x(n)

b0

a1 T

b1

a2

b2

[8] 8 Marks Unseen. Possible other less efcient solutions with more than two delay units (not shared for y (n 1)/x(n 1) . . . tapping award less marks if so

27 Marks Total Question 2 8

CM0340SOLNS

Q3.

(a) List six broad classes of digital audio effect. Give an example effect of each type of effect. 6 broad classes: Basic Filtering Lowpass, Highpass lter etc,, Equaliser [1] Time Varying Filters Wah-wah, Phaser [1] Delays Vibrato, Flanger, Chorus, Echo [1] Modulators Ring modulation, Tremolo, Vibrato [1] Non-linear Processing Compression, Limiters, Distortion, Exciters/Enhancers [1] Spacial Effects Panning, Reverb, Surround Sound [1] 6 Marks Bookwork (b) Give a description, including a signal ow diagram and algorithm, of the state variable lter.
yh (n) F1 yb (n) F1 yl (n)

x(n)

+ T

+ T

1 Q1

T T

where: x(n) yl (n) yb (n) yh (n) = = = =

input signal lowpass signal bandpass signal highpass signal [4]

The state variable lter algorithm/difference equations are given by:

yl (n) = F1 yb (n) + yl (n 1) yb (n) = F1 yh (n) + yb (n 1) yh (n) = x(n) yl (n 1) Q1 yb (n 1)


1

PLEASE TURN OVER

CM0340SOLNS

[2] with tuning coefcients F1 and Q1 related to the cut-off frequency, fc , and damping, d: F1 = 2 sin(fc /fs ), and Q1 = 2d [2] 8 Marks Bookwork (c) Give two advantages of the state variable lter. The 2 advantages are: i. Independent control over the cut-off frequency and damping factor of a lter. [1] ii. Simultaneous lowpass, bandpass and highpass lter output. [1] 2 Marks Bookwork (d) A band-reject lter is a lter which passes all frequencies except those in a stop band centered on a center frequency. How can such a lter be implemented using two state variable lters? Create a low pass lter with cut-off frequency, flow Create a high pass lter with cut-off frequency, fhigh Add the outputs of the low and high pass lter. Band reject is the range flow fhigh , where flow < fhigh [1] [1] [1] [1]

4 Marks UNSEEN. Band-reject/bandstop lter dened in lecture but not implemented (e) How may a phaser effect be implemented using two state variable lters? A phaser is implemented with a (set of) time-varying frequency notch lters. [2] Notch lter a very narrow band-reject/bandstop lter. [1] To get narrow band,: set high Q factor sharp slope on lter cut-off [1] flow < fhigh , but flow close in value to fhigh (around some centre frequency) [1] Perform band-reject with above parameters, as in part (d) above BUT modulate the frequency range with a sine or triangular wave over a short range. [1] 10

CM0340SOLNS

A cascade of such lters implements an multiple notch phases where each notch lter has a different centre frequency. [1] 7 Marks UNSEEN. Phaser dened in lecture but no implementation given similar to wahwah (notch lter instead of bandpass) implementation which was given in lecture. Notch lter dened in lecture as very narrow band-reject/bandstop lter

27 Marks Total Question 3 11 PLEASE TURN OVER

CM0340SOLNS

Q4.

(a) Give a denition of a one-dimensional Fourier transform. The Fourier transform of that function is denoted F (u), where u represents spatial frequency is dened by

F (u) =

f (x)e2ixu dx. [2]

2 Marks - BOOKWORK (b) Explain in detail how data is represented after the Fourier transform has been applied to a signal. Essentially the Fourier transform decomposes an input signal into a series of sine waveforms with varying amplitude, frequency and phase. [2] 2 Marks - BOOKWORK (c) Outline the basic approach to performing data ltering with the Fourier transform. Compute Fourier transform of signal. [1] Compute Fourier transform of Filter (if convolution) or create lter in frequency space (e.g. ideal low pass/butterworth lter). [1] Multiply (or divide if forward lter form used above) the two Fourier transforms above. [1] Inverse Fourier transform to get ltered data. [1] 4 Marks - BOOKWORK (d) Describe one application of Fourier transform ltering methods in multimedia data compression. Most obvious is MPEG AUDIO, (however one alternative answer might be in JPEG/MPEG video if Fourier Transform is replaced by a Discrete Cosine Transform which is a related method?) MPEG audio compression basically works by: Dividing the audio signal up into a set of 32 frequency subbands apply FOURIER FILTERING [2] Subbands approximate critical bands of human hearing. [1] Each band quantised according to the audibility of quantisation noise. [1] 12

CM0340SOLNS

Exploit Frequency Masking near frequencies not heard in same time frame [2] Exploit Temporal Masking near frequencies not heard close to some short time frame between frequencies [2] 8 Marks - (Distillation of ) BOOKWORK (e) An exciter is a digital audio signal process that emphasises or de-emphasises certain frequencies in a signal in order to change its timbre. Describe how you could use the Fourier transform to implement such a process, giving a practical example and explaining how it works. Approach similar to basic ltering except no frequency content removed. [2] In some cases similar to an equaliser (discussed in lectures) but non-linear exaggeration of frequencies. [1] Basic Approach: Compute Fourier transform of signal. [1] Apply some function that will enhance/diminish certain frequencies in frequency/Fourier space. [6] Example to enhance high frequencies: This could be similar to distortion type amplication (discussed in lectures) which operates on amplitude not frequencies. E.g: Some soft clipping

x
3(23x)2 3

f (x) =

for 0 x < 1/3f reqrange for 1/3 x < 2/3f reqrange for 2/3 x 1f reqrange x 2 (1 ex /|x| ) |x| [1]

or some non-linear function: f (x) =

Inverse Fourier transform to get excited data.

11 Marks - UNSEEN. exciter mentioned briey in lecture but no implementation details discussed

27 Marks Total Question 4

13X

END OF EXAMINATION

CM0340 SOLNS CARDIFF UNIVERSITY EXAMINATION PAPER

Academic Year: Examination Period:

2008/2009 Autumn

Examination Paper Number: CM0340 SOLNS Examination Paper Title: Multimedia

SOLUTIONS
Duration: 2 hours

Do not turn this page over until instructed to do so by the Senior Invigilator. Structure of Examination Paper: There are 14 pages. There are 4 questions in total. There are no appendices. The mark obtainable for a question or part of a question is shown in brackets alongside the question. Students to be provided with: The following items of stationery are to be provided: ONE answer book. Instructions to Students: Answer 3 questions. The use of translation dictionaries between English or Welsh and a foreign language bearing an appropriate departmental stamp is permitted in this examination.

PLEASE TURN OVER

CM0340 SOLNS

Q1.

(a) Briey outline the four broad classes of approach that one may exploit to compress multimedia data. Do not detail any specic compression algorithms. Compression basically employs redundancy in the data: Temporal in 1D data, 1D signals, Audio etc. correlation between sequential data points. [2] Spatial correlation between neighbouring pixels or data items in 2D. [2] Spectral This uses the frequency domain to exploit relationships between frequency of change in data. E.g. in video/imagery, correlation between colour or luminescence components. [2] Psycho-visual exploit perceptual properties of the human auditory/visual system. [2] 8 MARKS BOOKWORK Give one example of a compression algorithm for each class. EXAMPLES: Temporal Any Audio/Video compression method, Zero length suppression, pattern substitution, Pulse Code Modulation (a few variants), MPEG Audio, MPEG Video, H.264 [1] Spatial Any Image/Video compression algorithm, GIF, JPEG, MPEG, H.264. [1] Spectral JPEG, MPEG, H.264. [1] Psycho-visual MPEG audio, MPEG Video, JPEG (colour conversion). [1] 4 MARKS BOOKWORK (b) What advantage does arithmetic coding offer over Huffman coding for data compression? Huffman coding assumes an integer number (k) of bits for each symbol hence k is never less than 1 [1] Arithmetic coding can represent fractional number of bits and can achieve better compression ratios. [1] 2 MARKS BOOKWORK (c) Briey state an algorithm for arithmetic decoding. SEEN In LECTURE Coding: The idea behind arithmetic coding is To have a probability line, 01, and Assign to every symbol a range in this line based on its probability, 2

CM0340 SOLNS Order in terms of probability highest rst. Note: The higher the probability, the higher range which assigns to it. For each symbol in the sequence assign a code based in symbols probability and then subdivide for all the symbols:
range = high - low; high = low + range * high_range of the symbol being coded; low = low + range * low_range of the symbol being coded;

Decoding is the opposite so need to work out (unseen in lectures) For current code value: look up in table and assign symbol [1] Eliminate symbol effect by subtracting the low value in the range and divide by range [2] Repeat above two steps until zero reached see last part of problem. [2]

Total 5 marks Unseen. (Coding algorithm discussed in lectures, decoding simply mentioned as the reverse process)

(d) Given the following table of frequency counts, probabilities and probability ranges for the following characters: Char Freq Prob. Range A 2 0.5 [0.0, 0.5) B 1 0.25 [0.5, 0.75) C 1 0.25 [0.75, 1.0) What is the 4 character sequence for the arithmetic coding: 0.59375? Char Code-Low Range B 0. 0.59375 - 0.5 = 0.09375 0.09375/0.25 = 0.375 A 0.375 - 0.0 = 0.375 0.375/0.5 = 0.75 C 0.75 - 0.75 = 0.0 0.0/0.25 = 0.0 A 0.0 - 0.0 = 0.0 0.0/0.5 = 0.0 4 marks for each step in computation of the code It is possible for the decoder to return a zero value which corresponds to the symbol in a probability range rather than the end of the decoding process. How can this problem be avoided in the arithmetic decoder? As can be seen in the above decoding it third step decoding to C returns 0 but we need A also 0 as last step. Solution: Need some end of input (end-of-le) additional symbol. 4 Marks TOTAL 8 MARKS UNSEEN. Coding algorithm discussed in lectures, decoding simply mentioned as the reverse process.

PLEASE TURN OVER

CM0340 SOLNS

Q2.

(a) In a digital signal processing system, what are meant by block and sample-bysample processing. Give one example of an application of each type. Block processing : data is transferred into a memory buffer and then processed each time the buffer is lled with new data. [1] E.g. fast Fourier transforms (FFT), Discrete Cosine Transform (DCT), convolution, convolution reverb more soon [1] Sample-by-sample processing : input is processed on individual sample data. [1] E.g. volume control, envelope shaping, modulation (ring/amplitude), IIR/FIR ltering.... [1] 4 Marks Total Bookwork (b) Give denitions of the transfer function and frequency response of a digital system, in terms of its impulse response. Given an impulse response h(n) simply apply the Z-Transform: H (z ) = to get the transfer function H (z ). Similarly apply the Fourier Transform: H (f ) =
n= n=

h(n).z n [1]

h(n).ei2f n/fs [1]

to get the Frequency Response H (f ) . 2 Marks Total - Bookwork

CM0340 SOLNS

(c) Briey discuss three algorithmic approaches to implementing ltering in a digital system Innite Impulse Response Filter (IIR) : A simple IIR system can be described as follows: y (n) = x(n) a1 y (n 1) a2 y (n 2) The output signal y (n) is fed back through a series of delays Each delay is weighted Fed back weighted delay summed and passed to new output. Such a feedback system is called a recursive system
x(n)

y (n)

a1 T

y (n 1) = xH 1 (n)

a2

y (n 2) = xH 2 (n)

PLEASE TURN OVER

CM0340 SOLNS

Finite Impulse Response Filter (FIR) : A simple FIR system can be described as follows: y (n) = b0 x(n) + b1 x(n 1) + b2 x(n 2) The input is fed through delay elements Weighted sum of delays give y (n)
x(n)

b0

y (n)

x(n 1) = xH 1 (n) T

b1

x(n 2) = xH 2 (n)

b2

Fourier Space Filtering : F (u, v ) is the Fourier transform of the original image, H (u, v ) is a lter function (in Fourier Space) could be inverse of Fourier Transform of a real space lter function, G(u, v = H (u, v )F (u, v )) is the Fourier transform of the improved image. Inverse Fourier transform G(u, v ) to get g (x, y ) our improved image TOTAL 9 Marks 3 marks per method

CM0340 SOLNS

(d) Given the following difference equation construct its signal ow diagram: y (n) = 6x(n) + 3x(n 1) + 1x(n 2) 5y (n 1) 4y (n 2) Solution:
x(n)

y (n)

5 T

[6] 6 Marks Unseen. Possible other less efcient solutions with more than two delay units (not shared for y (n 1)/x(n 1) . . . tapping award less marks if so

PLEASE TURN OVER

CM0340 SOLNS

(e) Given the following signal ow diagram construct its difference equation, y (n):
0.5

0.8 x(n)

T 3 (=

z 3 )

y (n)

-0.3

xh (n) = x(n) 0.3x(n 3) y (n) = 0.5xh (n) + 0.8xh (n 3) Trick is to break up feedback loop into sub equation xh (n).
0.5

0.5xh (n)

x(n)

+
xh (n)

T 3 (= z 3 )

x 1h (n 3)

0.8

y (n)

-0.3

[6] 6 Marks Unseen.

CM0340 SOLNS

Q3.

(a) What is the difference between reverb and echo? Echo implies a distinct, delayed version of a sound, [1] Reverb each delayed sound wave arrives in such a short period of time such that we do not perceive each reection as a copy of the original sound. [1] TOTAL 2 Marks Bookwork (b) Give the names of two lter based approaches to simulating the reverb effect in digitial audio. [2] Schroeders Reverberator Moorers Reverberator [1] [1]

Comment on how one approach builds on the other and how lters are used to achieve the desired effect.par Schroeders Reverberator : Early digital reverberation algorithms tried to mimic the a rooms reverberation by primarily using two types of innite impulse response (IIR) lters. Comb lter usually in parallel banks Allpass lter usually sequentially after comb lter banks A delay is (set via the feedback loops allpass lter) aims to make the output would gradually decay. Moorers Reverberator : Moorers reverberator builds on Schroeder: Parallel comb lters with different delay lengths are used to simulate modes of a room, and sound reecting between parallel walls [1] Allpass lters to increase the reection density (diffusion). [1] Lowpass lters inserted in the feedback loops to alter the reverberation time as a function of frequency [1] Shorter reverberation time at higher frequencies is caused by air absorption and reectivity characteristics of wall). [1] Implement a dc-attenuation, and a frequency dependent attenuation. [1] Different in each comb lter because their coefcients depend on the delay line length [1] 6 Marks Bookwork

PLEASE TURN OVER

CM0340 SOLNS

(c) State one alternative approach to reverb simulation that does not employ lters. Convolution Reverb [1] 1 Mark Bookwork Briey, giving no mathematical detail, describe how this approach is implemented. record impulse response of room g (x), input audio is f (x) [1] compute Fourier transform of impulse response G(u and audio signal, F (u) [1] compute convolution of two signals, multiply both Fourier transforms H (u) = F (u).G(u) [1] compute inverse Fourier transform of H (u), h(x) the reverberated signal.[1] Total 4 marks Bookwork (d) For each of the three reverb methods you have described above discuss how, in the following two scenarios, the sounds recorded by the microphone could be modelled: i. A long hallway where the long walls are lined with a high frequency absorbing acoustic panels. The sound source is placed at one end of the hallway and a microphone is placed at the other end. Schroeders Reverberator : estimate time of sound bouncing down corridor to set delay estimate some ltering of high frequencies

[2]

Moorers Reverberator : estimate time of sound bouncing down corridor to set comb lter delay estimate some ltering of high frequencies for allpass lters and lowpass lters [2] Convolution Reverb : Record impulse response of hallway Perform convolution reverb computation. [2] Total 6 marks unseen

10

CM0340 SOLNS

ii. A cardoid microphone is a microphone that accepts sound from the front and sides but not the back of the microphone. In a square recording studio, with uniform surfaces, a cardoid microphone is placed directly facing a sound source a few feet away. Schroeders Reverberator : Not much one can model except as before, estimate time of set short delay to account for no reections recorded and estimate some ltering of high frequencies Cant easily model Cardoid response [2] Moorers Reverberator : There will be little recording of back reections so allow little feedback to comb lters Tapped delay lines which simulate early reections could have delay and frequency lters set. [2] Convolution Reverb : Record impulse response of hallway with a cardoid microphone. Perform convolution reverb computation. [2] Total 6 marks unseen

11

PLEASE TURN OVER

CM0340 SOLNS

Q4.

(a) Briey outline the basic principles of Intra-Frame Coding in Video Compression, as used in MPEG-2.

This is a basic Intra Frame Coding Scheme is as follows: Convert tp more effective color space: YUV (YCbCr). [1] A macroblock usually consists of 4 Y blocks, 1 Cr block, and 1 Cb block. (4:2:0 chroma subsampling) Since eye most sensitive luminance, less sensitive chrominance. [1] Break frame up into macroblocks which are typically 16x16 pixels. [1] Perform DCT on each Macroblock [1] Quantization is by constant value for all DCT coefcients. I.e., no quantization table as in JPEG. [1] Zig-zag vectorisation of quantised DCT coefcients [1] Run length encoding (RLE) on zig-zaf vector [1] Huffman coding on RLE values [1] Total 8 Marks bookwork (b) What is the key difference between I-Frames, P-Frames and B-Frames? I-Frame Basic Reference frame for each Group of pictures essentially a JPEG Compressed image. [1] P-Frame Coded forward Difference frame w.r.t last I or P frame [1] B-Frame Coded backward Difference frame w.r.t last I or P frame [1] Total 3 Marks Bookwork

12

CM0340 SOLNS

(c) Why are I-frames inserted into the compressed output stream relatively frequently? Differences between frames get too large large errors hard to track fast blocks etc. So need to restart card with a new I-frame. [2] Total 2 Marks Bookwork (d) Given the following coding order of a group of frames in MPEG-2: I 1 P 2 B 3 B 4 B 5 P 6 B B 7 8 B 9 I 10 B 11 B 12 B 13 I 14 P 15 B 16 P 17

What is display order of the frames? Display Order is: I B 1 3 B 4 B 5 P 2 B 7 B 8 B 9 P 6 I 10 B 11 B 12 B 13 I 14 B 16 P 15 P 17

[7] 2 Marks for decoding the rst IBBBP 1 Mark for decoing next BBBP (essentially a repeat of rst block) 2 Marks for next IBBB I frames cant change order so no change in order. 2 Marks for IBPP only one (rst in this sequence) P frame changes order. 7 Marks Unseen

13

PLEASE TURN OVER

CM0340 SOLNS

(e) The following macroblock window has a best sum of absolute difference (SAD) match of 1 to a given MPEG Interframe search: 4 3 2 5

Should inter or intraframe coding be employed to code this macroblock, and why? Method from lecture notes: Based upon the motion estimation a decision is made on whether INTRA or INTER coding is made. To determine INTRA/INTER MODE we do the following calculation:
1 N i=0,j =0 |C (i,j )| N

M Bmean =

A = n,m i=0,j =0 | C (i, j ) M Bmean | If A < (SAD 2N ) INTRA Mode is chosen. So for this problem: M Bmean = 7

[2]

A = |47|+|27|+|37|+|57| = 3+5+4+2 = 14 [2] SAD 2N = 3 [2] So A is not less than (SAD 2N ) we choose INTER FRAME Coding [1] 7 marks unseen problem application of bookwork formula.

14X

END OF EXAMINATION

CM0340 Solutions CARDIFF UNIVERSITY EXAMINATION PAPER

Academic Year: Examination Period:

2009/2010 Autumn

Examination Paper Number: CM0340 Solutions Examination Paper Title: Duration: Multimedia 2 hours

Do not turn this page over until instructed to do so by the Senior Invigilator. Structure of Examination Paper: There are 13 pages. There are 4 questions in total. There are no appendices. The maximum mark for the examination paper is 80 and the mark obtainable for a question or part of a question is shown in brackets alongside the question. Students to be provided with: The following items of stationery are to be provided: ONE answer book. Instructions to Students: Answer 3 questions. The use of translation dictionaries between English or Welsh and a foreign language bearing an appropriate departmental stamp is permitted in this examination.

PLEASE TURN OVER

CM0340 Solutions

Q1.

(a) What is the distinction between lossy and lossless data compression? Lossless Compression after decompression gives an exact copy of the original data [1] Lossy Compression after decompression gives ideally a close approximation of the original data, in many cases perceptually lossless but a byte-bybyte comparison of les shows differences. [1] 2 Marks Bookwork Give one example of a lossy and lossless compression algorithm. Lossless Compression Examples: 1 from Entropy Encoding Schemes (ShannonFano Huffman coding), arithmetic coding, LZW algorithm [1] used in GIF image le format. [1] Lossy Compression Examples : 1 from Transform Coding (FFT/DCT based quantisation), differential encoding, vector quantisation [1] 2 Marks Bookwork (b) List three pattern substitution based compression algorithms. Repetitive Sequence Suppression Run-length Encoding Pattern Substitution 3 Marks Bookwork For each algorithm, give one application where the method is used with respect to multimedia data. Repetitive Sequence Suppression Example: 1 from Silence suppression in audio, white space in text, simple uniform backgrounds in images [1] Run-length Encoding : 1 from Computer graphics generated images, Faxes, part of JPEG (latter stage) pipeline [1] Pattern Substitution : 1 from Pattern recognition/token substitution, Entropy coding (Huffman), LZW/GIF, vector quantisation [1] 3 Marks Bookwork (c) What is the basic concept used in dening an Information Theoretic approach to data compression? The entropy of an information source S , dened as: H (S ) = =
i

[1] [1] [1]

1 pi log2 p i [2]

, is the basis Information Theoretic compression algorithms. 2 Marks Bookwork 2

CM0340 Solutions

(d) Why is the Huffman coding algorithm better at data compression that the ShannonFano Algorithm? (A bottom-up approach) Captures the ideal entropy more closely that Shannon-Fano 2 Marks Bookwork [2]

(e) What advantages does the arithmetic coding algorithm offer over Huffman coding algorithm with respect to data compression? Good compression ratio (better than Huffman coding), entropy around the Shannon Ideal value. Huffman coding uses an integer number of bits for each symbol, hence k is never less than 1. Use decimal number of bits 3 Marks Bookwork Are there any disadvantages with the arithmetic coding algorithm? Memory: potentially large symbol tables needed Speed due possibly complex computations due to large symbol tables, 2 Marks Bookwork [1] [1] [1] [1] [1]

(f) Given the following Differential Pulse Code Modulated (DPCM) Sequence reconstruct the original signal. +4 + 2 + 3 2 + 3 1 + 1 + 1 DPCM decoding: Simply start with accumulator zero for each number add the value to current accumulator, output accumulator value. So solution is: 4 6 9 7 10 9 11 12 [4] 4 Marks Unseen Problem: DPCM encoding covered in lectures 3 PLEASE TURN OVER

CM0340 Solutions

(g) Given the following Run Length Encoded (RLE) Sequence reconstruct the original 2D 8x8 (binary) data array. (0, 8), (0, 1), (1, 1), (0, 4), (1, 1), (0, 1), (0, 1), (1, 2), (0, 2), (1, 2), (0, 1), (0, 1), (1, 6), (0, 1), (0, 2), (1, 4), (0, 2), (0, 3), (1, 2), (0, 3), (0, 2), (1, 1), (0, 2), (1, 1), (0, 2), (0, 1), (1, 1), (0, 4), (1, 1), (0, 1) The format of RLE is for each pair (colour, length) so just parse each row, to expand colour to length number of values to get the solution: 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 1 0 0 1 1 1 0 1 0 0 0 0 1 1 1 0 0 0 0 0 1 1 1 0 0 0 0 1 1 1 0 1 0 0 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 [4] 4 Marks Unseen Problem: RLE encoding covered in lectures Question 1 Total Marks 27

CM0340 Solutions

Q2.

(a) What is MIDI? Denition: A protocol that enables computers, synthesizers, keyboards, and other musical devices to communicate with each other. [1] 1 Mark Bookwork

(b) What features of MIDI make it suitable for use in the MPEG-4 audio compression standard? MIDI is very low bandwidth when compared to audio. Sounds synthesised at client only control data transmitted [1] MIDI can control many performance aspects [1] 2 Marks Bookwork

(c) Briey outline the MPEG-4 structured audio standard. MPEG-4 Structured Audio Tools MPEG-4 comprises of 6 Structured Audio tools are

[6]

SAOL the Structured Audio Orchestra Language: SAOL is the central part of the Structured Audio toolset. It is a software-synthesis language denes how to make the sounds [1] SASL the Structured Audio Score Language: MIDI controls how SAOL makes the sounds [1] SASBF the Structured Audio Sample Bank Format: SASBF is a format for efciently transmitting banks of sound samples to be used in wavetable, or sampling, synthesis. [1] Set of MIDI semantics which describes how to control SAOL and SASL scripts (MIDI based) [1] . [1] Scheduler which describes how to take the above parts and create sound: It is a set of carefully dened and somewhat complicated instructions that specify how SAOL is used to create sound when it is driven by MIDI or SASL. [1] AudioBIFS part of BIFS, which lets you make audio soundtracks in MPEG-4 using a variety of tools and effects-processing techniques [1] 6 Marks Bookwork

PLEASE TURN OVER

CM0340 Solutions

(d) What features of MIDI make it suitable for controlling software or hardware devices? Basic syntax of midi is all about control playing notes, setting up sounds etc. [1] A wide number of specialised control messages some set like sustain, modulation, pitch bend others freely assignable. [1] Wide range of controllers available e.g. built in to keyboards, specialist hardware, software reassignable. [1] Midi System Real-time Messages for control of timing/syncing e.g. SMPTE, Midi Time Code. [1] System exclusive command set makes MIDI completely extensible to control any device. [1] External hardware to convert MIDI to/from other formats e.g. Pitch to Midi converters, Midi to Control Voltage (analogue synths), Motion Capture to MIDI!. [1] 6 Marks unseen. Assimilation of various aspects of bookwork (e) In terms of controlling devices, what limitations does MIDI have in terms of the level of control, the number of devices and the number of independent control items within a device? Basic issues surrounding MIDI Control (Students may suggest others, marks will be awarded for any sensible suggestions). Example solution Level of Control: MIDI controllers only have an 8-bit resolution so ne tuned control or control of a large range may not be possible. [2] Number of Devices: MIDI allows for 16 different channels on one MIDI connection, so up to 16 devices can only be controlled. [2] Number of independent control items within a device: Serial device based on ancient RS-232 Interface (although no upgraded to USB/FIREWIRE/ETHERNET). May have limited bandwidth. MIDI is essentially Serial device protocol even though polyphony in music has to be supported, all MIDI messages sent in a sequence (Fundamental serial resolution is about one message per millisecond). So MIDI clog might occur if a lot of data needs to be sent rapidly. [2] (Alternative) Number of independent control items within a device: MID has a limited number of assignable controllers. 256 are allowed. [2] Possibly other suggestions. 6 Marks, 2 per solution for each limitation (example solution above) Unseen question, Extended reasoning from basic MIDI understanding

CM0340 Solutions

Suggest a solution that can employed to remedy each of these problems using standard MIDI devices. Indicative Solutions (Related to above): Level of Control: MIDI controllers only have an 8-bit resolution. Solution: Can utilise two controllers in tandem one for LSB one for MSB to increase resolution to 16 bit for example. [2] Number of Devices: MIDI allows for 16 different channels on one MIDI connection. Solution: MIDI allows for multiple devices (although this have bandwidth implication) USB/FIREWIRE/ETHERNET/LAN has adequate Bandwidth and may support multiple devices as standard. [2] Number of independent control items within a device: Serial device based on ancient RS-232 Interface (although no upgraded to USB/FIREWIRE/ETHERNET). Solution: As above USB/FIREWIRE/ETHERNET/LAN has adequate Bandwidth. Could also use independent devices using independent connections if practical [2] (Alternative) Number of independent control items within a device: MID has a limited number of assignable controllers. 256 are allowed. Solution: Could also use multiple independent devices possibly using independent connections if practical [2] Possibly other suggestions. 6 Marks, 2 per solution for each limitation (example solution above) Unseen question, Extended reasoning from basic MIDI understanding Question 2 Total Marks 27

PLEASE TURN OVER

CM0340 Solutions

Q3.

(a) Briey outline, with the aid of suitable diagrams, the JPEG/MPEG I-Frame compression pipeline and list the constituent compression algorithms employed at each stage in the pipeline. The Major Steps in JPEG/MPEG Coding involve: JPEG:

MPEG:

[2] Colour Space Transform and subsampling DCT (Discrete Cosine Transformation) Quantization Zigzag Scan Discrete Pulse Code Modulation (DPCM) on DC component (in JPEG), Run length encoding (RLE) on AC Components (JPEG), all of zig zag (MPEG). Entropy Coding Huffman or Arithmetic [7] 9 Marks Bookwork

CM0340 Solutions

What are the key differences between the JPEG and MPEG I-Frame compression pipeline? Four main differences for JPEG uses YIQ whilst MPEG use YUV (YCrCb) colour space [1] MPEG used larger block size DCT windows 16 even 32 as opposed to JPEGs 8 [1] Different quantisation MPEG usually uses a constant quantisation value. [1] Only Discrete Pulse Code Modulation (DPCM) on DC component in JPEG on zig zag scan. AC (JEPG) and complete zig zag scan get RLE. [1] 4 Marks Applied Bookwork: Some lateral thinking to compare JPEG and MPEG not directly compared in course notes at least

(b) Motion JPEG (or M-JPEG) is a video format that uses JPEG picture compression for each frame of the video. Why is M-JPEG not widely used as a video compression standard? Compressing in just each frame does not yield a high enough compression ratio that is required for general video needs. Can exploit temporal aspect of video to get better compression. [2] 2 Marks Bookwork

Briey state what additional approaches are used by MPEG video compression algorithms to improve on M-JPEG. Adopt some form of temporal compression. Use P-frames and B-frames to to differencing between frames and also motion estimation. [2] 2 Marks Bookwork

(c) What processes above give rise to the lossy nature of JPEG/MPEG video compression? Lossy steps: Colour space subsampling in IQ or UV components. Quantisation reduces bits needed for DCT components. 4 Marks Bookwork [2] [2]

PLEASE TURN OVER

CM0340 Solutions

(d) Given the following portion from a block (assumed to be 4x4 pixels to simplify the problem) from an image after the Discrete Cosine Transform stage of the compression pipeline has been applied:: 128 32 128 4 32 16 64 31 64 160 12 32 46 128 40 32

i. What is the result of the quantisation step of the MPEG video compression method assuming that a constant quantisation value of 32 is used? Trick needed to be remembered from notes is that we divide the matrix by the quantisation table or in this case a constant. So in this case divide all values by 32 and round down (Integer division). 4 1 4 0 1 0 2 0 2 0 1 1 5 1 4 1

[3] ii. What is the result of the following zig-zag step being applied to the quantised block? Trick needed to be remembered from notes is that Zig-zag reads of values from DCT in an increasing low frequency order (better that row by row). Create a vector rather than a matrix.

So we get a vector from matrix above: 4114025020011414 6 Marks: Unseen Problem Question 3 Total Marks 27 [3]

10

CM0340 Solutions

Q4.

(a) In MPEG audio compression, what is frequency masking? When an audio signal consists of multiple frequencies the sensitivity of the ear changes with the relative amplitude of the signals. If the frequencies are close and the amplitude of one is less than the other close frequency then the second frequency may not be heard. [2] 2 Marks: Bookwork

(b) Briey describe the cause of frequency masking in the human auditory system? Frequency Masking: Stereocilia in inner ear get excited as uid pressure waves ow over them. [1] Stereocilia of different length and tightness on Basilar membrane so resonate in sympathy to different frequencies of uid waves (banks of stereocilia at each frequency band). . [1] Stereocilia already excited by a frequency cannot be further excited by a lower amplitude near frequency wave. [1] 3 Marks: Bookwork

(c) In MPEG audio compression, what is temporal masking? After the ear hears a loud sound, consisting of multiple frequencies, it takes a further short while before it can hear a quieter sound close in frequency. [2] 2 Marks: Bookwork (d) Briey describe the cause of temporal masking in the human auditory system? (Like frequency masking) Stereocilia in inner ear get excited as uid pressure waves ow over them and respond to different frequencies. [1] Stereocilia already excited by a certain frequency will take a while to return to rest state, as inner ear is a closed uid chamber and pressure waves will eventually dampen down. [1] Similar to frequency masking Stereocilia in a dampening state may not respond to a a lower amplitude near frequency wave. [1] 3 Marks: Bookwork 11 PLEASE TURN OVER

CM0340 Solutions

(e) Briey describe, using a suitable diagram if necessary, the MPEG-1 audio compression algorithm, outlining how frequency masking and temporal masking are encoded. MPEG audio compression basically works by: Dividing the audio signal up into a set of frequency subbands (Filtering) [1]

Use lter banks to achieve this. Subbands approximate critical bands. Each band quantised according to the audibility of quantisation noise. Frequency masking and temporal masking are encoded by:

27

[2] [1] [1] [1]

Frequency Masking MPEG Audio encodes this by quantising each lter bank with adaptive values from neighbouring bands energy, dened by a look up table. [2] Temporal Masking Not so easy to model as frequency masking. MP3 achieves this with a 50% overlap between successive transform windows gives window sizes of 36 or 12 and applies basic frequency masking as above. [2] 10 Marks: Bookwork

12

CM0340 Solutions

(f) Given two stereo channels of audio: Left Channel: Right Channel: 14 11 10 11 14 16 16 5 17 44 20 20

i. Apply Middle/Side (MS) stereo redundancy coding to the sequence. [3] (Recap): Stereo Redundancy at low frequencies, the human auditory system cant detect where the sound is coming from, So don need stereo. Middle/Side (MS) stereo redundancy coding Basic Idea: Middle sum of left and right channels Side difference of left and right channels. So solution is : Middle: Side: 25 3 25 -3 26 -6 21 9 61 40 -27 0

3 Marks: Unseen Problem

ii. How may this result be employed to achieve compression? Illustrate your answer with respect to the above data. Encode side in less bits as it is essentially Differential Pulse Code Modulation. [1] Use specially tuned threshold values to compress the side channel signal further. [1] Code Middle in normal (for audio) 16 bits (8 Bits would be OK for this answer) [1] Code Side in reduced number of bits. Needs to be signed so in the above 7 bits needs [1] 4 Marks: Unseen Problem Question 4 Total Marks 27

13X

END OF EXAMINATION

CM0340 Solutions CARDIFF UNIVERSITY EXAMINATION PAPER

Academic Year: Examination Period:

2010/2011 Autumn

Examination Paper Number: CM0340 Solutions Examination Paper Title: Duration: Multimedia 2 hours

Do not turn this page over until instructed to do so by the Senior Invigilator. Structure of Examination Paper: There are 11 pages. There are 4 questions in total. There are no appendices. The maximum mark for the examination paper is 81 and the mark obtainable for a question or part of a question is shown in brackets alongside the question. Students to be provided with: The following items of stationery are to be provided: ONE answer book. Instructions to Students: Answer 3 questions. The use of calculators is permitted in this examination. The use of translation dictionaries between English or Welsh and a foreign language bearing an appropriate departmental stamp is permitted in this examination.

PLEASE TURN OVER

CM0340 Solutions

Q1.

(a) What are the differences between analog signals and digital signals? Analog Signals continuous signals of some time varying quantities; cant be processed directly by computers [1] Digital Signals digital samples of the signals at regular interval; can be readily processed by computers [1] 2 Marks Bookwork

A computer is to be used to add effects to analog audio signals. What two types of devices in general are needed? Describe their functionalities in the processing pipeline. Analog-to-Digital Converter (ADC) : take analog signals from analog sensor and digitally sample data. [1] Digital-to-Analog Converter (DAC) : take digital signals from computer and outputs an analog signal that may be displayed by output device. [1] 2 Marks Bookwork

(b) Audio signals are often sampled at different rates. CD quality audio is sampled at 44.1kHz rate while telephone quality audio sampled at 8kHz. What are the maximum frequencies in the input signal that can be fully recovered for these two sampling rates? Briey describe the theory you use to obtain the results? CD quality audio, the maximum frequency: 44,100Hz / 2 = 22,050Hz. [1] Telephone quality audio, the maximum frequency: 8kHz / 2 = 4kHz. [1] This is based on Nyquist theorem: the sampling frequency for a signal must be at least twice the highest frequency component in the signal. [1] 3 Marks Unseen problem based on theories covered in lectures

If an arbitrary input signal is directly sampled, what artefact may result and how to solve this? This may result in aliasing artefact. [1] To solve this, add an analog low pass lter before sampling to eliminate high frequency components. [1] 2 Marks Bookwork

CM0340 Solutions

(c) Using wavetable for digital audio synthesis, for two audio samples as follows (simplied for ease of calculation) S1 : 0, 1, 0, 1, 0, 1, 0, S2 : 1, 0.6, 0.4, 0, 0.4, 0.6, 1, each with 7 samples (17). What is the output of linear crossfading from S1 to S2 , assuming samples 35 are transitional samples with mixed information from both sources? The envelope for S1 is E1 = (1, 1, 0.75, 0.5, 0.25, 0, 0) and the envelope for S2 is E2 = (0, 0, 0.25, 0.5, 0.75, 1, 1), calculate S = S 1. E 1 + S 2. E 2 (elementwise multiplication), we have 0, 1, 0.1, 0.5, 0.1, 0.6, 1 [4] For a wavetable entry S3 representing a short audio sample with fundamental frequency of 600Hz. From this, a reshaped wave S4 is derived satisfying S4 (t) = S3 (0.75t). What is the fundamental frequency of S4 assuming the same sampling rate is used for playback? 4 T ) = S3 (T ), the period Assume the period of the original audio is T . Since S4 ( 3 4 3 of the new audio is 3 T . The frequency of the new audio is 4 f = 3 600 = 4 450Hz. [2] 6 Marks Unseen problem (d) Answer the following questions based on the given gure:

TD

i. What is the audio synthesis algorithm described in the diagram? This diagram illustrates Karplus-Strong audio synthesis algorithm. [1] ii. Which general approach of digital audio synthesis most accurately does this algorithm belong to? The algorithm belongs to Physical Modelling approach. [1] iii. If the audio is sampled at 44.1kHz, and D = 100 (see the gure), what is the fundamental frequency of the synthesised audio? According to the relationship D= Fs , F1

where D is the delay, F1 is the fundamental frequency and Fs is the sampling s = 44100 = 441Hz . [2] frequency. We have F1 = F D 100 iv. To halve the fundamental frequency of the synthesised audio, what two possible changes can you make? According the relationship, to halve the fundamental frequency F1 , either halve sampling frequency Fs or double the delay D. [2] 6 Marks Unseen problem with the theories covered in lectures (e) Describe what ADSR envelope means (with an illustration) and how this is applied in digital audio synthesis. ADSR means 3 PLEASE TURN OVER

CM0340 Solutions

Attack : How quickly the sound reaches full volume after the sound is activated. [1] Decay : How quickly the sound drops to the sustain level after the initial peak. [1] Sustain : The constant volume that the sound takes after decay until the note is released. [1] Release : How quickly the sound fades when a note ends. [1]

(illustration) [1] ADSR is used to modulate some aspects of the instruments sound often its volume over time, simulating the behaviour of mechanical instruments. [1] 6 Marks Bookwork Question 1 Total Marks 27

CM0340 Solutions

Q2.

(a) Dithering is often used when converting greyscale images to monochrome. i. What is the basic idea of dithering? Dithering is to use a larger pattern to approximate the (greyscale) levels of the input image. [1] 1 Mark Bookwork ii. For the given 2 2 dither matrix, briey describe the ordered dithering algorithm. 0 2 3 1 The algorithm involves the following steps re-map each intensity to a range of 0 to 4 by dividing the intensity (for 256 levels) by (256/5). tile the dither matrix to the same dimension as the input image. if the remapped intensity is larger than the corresponding dither matrix entry, put a 1 otherwise 0 [2] 2 Marks Bookwork iii. Use the same dither matrix, what is the result for the following input? Assume that the input is greyscale intensities normalised to 0 to 1. 0.66 0.54 0.70 0.67 0.18 0.13 0.99 0.17 0.03 0.56 0.88 0.67 0.19 0.37 0.46 0.98

The remapped intensities are calculated by f loor(I 5) for I being the input intensity. 3 2 3 3 The tiled dither matrix 0 3 0 3 1 0 1 0 4 Marks Unseen problem 5 PLEASE TURN OVER 2 1 2 1 0 0 1 0 0 3 0 3 0 0 1 0 2 1 2 1 0 0 0 1 [4] 0 0 4 0 0 2 4 3 0 1 2 4

Compare each element of A with B, the nal result:

CM0340 Solutions

(b) Different colour models are often used in different applications. What is the CMYK colour model? Give an application in which this colour model is mostly used and explain the reason. The CMYK colour model use Cyan, Magenta, Yellow and Black as primaries (components). [1] The CMYK colour model is mostly used in printing because the colour pigments on the paper absorb certain colours thus a subtractive model is suitable; black is used to produce darker black than simply mixing CMY. [2] 3 Marks Bookwork Given a colour represented in RGB colour space as R = 0.2, G = 0.6, B = 0.3, what is its representation in the CMYK colour model? First convert to CMY as 1 R 0.8 C M = 1 G = 0.4 Y 1 B 0.7

Then M ,Y ) = 0.4, K = min(C, K = 0.4, C=C K = 0, M =M K = 0.3. Y =Y [2] 2 Marks Unseen problem (c) Give three colour models other than RGB/CMYK and explain the benets of using the model by showing a practical application for each model. A possible answer: CIE L*a*b* : relate more closely to human perception, useful for image processing such as Photoshop [2] YUV(YCbCr) : separate colour from luminance component, useful for PAL video/MPEG [2] YIQ : separate colour from luminance component, useful for NTSC video/JPEG [2] Other sensible answers are acceptable as well. 6 Marks Bookwork, 1 mark for each colour model and 1 mark for each application (d) What is chroma subsampling? Why is chroma subsampling meaningful? What is the benet of doing chroma subsampling? Chroma subsampling is a method that stores colour information at lower resolution than intensity information. [1] Chroma subsampling is meaningful because human visual system is less sensitive to variations in colour than brightness. [1] Chroma subsampling can reduce the bandwidth for colour detail in almost no perceivable visual difference. [1] 6

CM0340 Solutions

3 Marks Bookwork For the following array of colour values, give chroma subsampling results with 4:2:2, 4:1:1 and 4:2:0 schemes. 80 88 96 12 60 8 72 68 24 52 52 8 28 20 48 12 Chroma subsampling result for 4:2:2 scheme: 80 96 60 72 24 52 28 48 [2] Chroma subsampling result for 4:1:1 scheme: 80 60 24 28 [2] Chroma subsampling result for 4:2:0 scheme: (80+88+60+8)/4=59 (96+12+72+68)/4=61 (24+52+28+20)/4=31 (52+8+48+12)/4=30 [2] 6 Marks Unseen problem Question 2 Total Marks 27

PLEASE TURN OVER

CM0340 Solutions

Q3.

(a) GIF and JPEG are two commonly used image representations. What images are suitable to be represented as GIF and JPEG? Do they usually use lossless or lossy compression? Explain the reason by showing the major compression algorithm (for lossless) or the lossy steps of the algorithm (for lossy). Target images: GIF : 256-colour (or 8 bit), potentially with transparency, so simple colour like graphics or drawing [1] JPEG : continuous 24-bit true colour images Lossless or lossy: GIF : Lossless. JPEG : Lossy. Key algorithms: GIF : Key algorithm is LZW (lossless) JPEG : Lossy steps involve quantisation and chroma subsampling 6 Marks Bookwork (b) In the following situations, which (mostly) lossless compression algorithm is most suitable? Briey describe the basic idea of each algorithm in one sentence. i. Compression of a sequence of tokens with known, uneven probability distribution. Arithmetic coding: a widely used entropy coder based on range division of oating numbers. [2] ii. Compression of a sequence of tokens with unknown probability but with reoccurrence of patterns. (Lempel-Ziv-Welch) LZW coding: a compression approach to adaptively build the dictionary; only initial dictionary needs to be transmitted. [2] iii. Compression of a sequence of gradually changing numbers. Differential Pulse Code Modulation (DPCM): encode the difference between adjacent samples to reduce the dynamic range. [2] iv. Compression of a sequence of tokens with same tokens often appearing consecutively. Run-Length Encoding (RLE): map sequence to pairs of the element and the number of consecutive runs. [2] 8 Marks Unseen problem or applied bookwork (c) What is the improvement of the LZW algorithm over the LZ algorithm? The LZW introduced the idea that only the initial dictionary needs to be transmitted to enable decoding. The decoder is able to build the rest of the table from the encoded sequence. [2] Given the following string as input (excluding the quotes), /THIS/IS/HIS/IS/ with the initial dictionary below, encode the sequence with LZW algorithm, showing the intermediate steps. [11] 8 [1] [1] [1] [1] [1]

CM0340 Solutions Index Entry 1 / 2 H 3 I 4 S 5 T The steps are given as follows: For: /THIS/IS/HIS/IS/ w k output index symbol -----------------------------------------------------NIL / / T 1 6 /T T H 5 7 TH H I 2 8 HI I S 3 9 IS S / 4 10 S/ / I 1 11 /I I S IS / 9 12 IS/ / H 1 13 /H H I HI S 8 14 HIS S / S/ I 10 15 S/I I S IS / IS/ EOF 12 So the output will be 1 5 2 3 4 1 9 1 8 10 12 11 Marks Unseen problem applying algorithms covered in lectures. 3 marks for keeping w, 2 marks for appropriate allocation of index, 3 marks for symbol table and 3 marks for output Question 3 Total Marks 27

PLEASE TURN OVER

CM0340 Solutions

Q4.

(a) List two psychological phenomena that have been exploited in MEPG audio compression. These involve frequency masking and temporal masking. [2] Briey explain their meanings. Frequency Masking : When an audio signal consists of multiple frequencies the sensitivity of the ear changes with the relative amplitude of the signals. If the frequencies are close and the amplitude of one is less than the other close frequency then the second frequency may not be heard. [1] Temporal Masking : After the ear hears a loud sound, consisting of multiple frequencies, it takes a further short while before it can hear a quieter sound close in frequency. [1] 4 Marks: Bookwork (b) What is the key difference between I-frames, P-frames and B-frames in MPEG-2 video compression? I-Frame : Basic reference frame for each group of pictures essentially a JPEG compressed image. [1] P-Frame : Coded forward difference frame w.r.t. last I or P frame [1] B-Frame : Coded backward difference frame w.r.t. last I or P frame [1] 3 Marks: Bookwork Give the advantages and disadvantages of using B-frames. Advantages : Improve code efciency as most B frames use less bits; quality can be improved in the case of moving objects that reveal hidden areas; better error propagation as B frames are not used to predict future frames. [1] Disadvantages : Frame reconstruction memory buffers within the encoder and decoder must be doubled in size to accommodate the 2 anchor frames; potentially more delays for online applications. [1] 2 Marks: Bookwork (c) Assume 2 2 macroblock is used. For the following macroblock # # # # # 6 4 # # 2 2 # # # # # the corresponding intensities in the reference frame are given as follows: 5 3 6 2 1 4 7 3 4 5 3 3 3 2 3 3 Calculate the motion vector, with complete search within 1 pixel search window. List the steps to obtain the result. For all the 9 possibilities, compute sum of absolute difference: -1, -1 : |5 6| + |3 4| + |1 2| + |4 2| = 5 10

CM0340 Solutions

-1, 0 : |1 6| + |4 4| + |4 2| + |5 2| = 10 -1, 1 : |4 6| + |5 4| + |3 2| + |2 2| = 4 0, -1 : 3 6| + |6 4| + |4 2| + |7 2| = 12 0, 0 : |4 6| + |7 4| + |5 2| + |3 2| = 9 0,1 : |5 6| + |4 3| + |2 2| + |3 2| = 3 1, -1 : |6 6| + |2 4| + |7 2| + |3 2| = 8 1, 0 : |7 6| + |4 3| + |3 2| + |3 2| = 4 1, 1 : |6 3| + |4 3| + |3 2| + |3 2| = 6 [3] So the motion vector is (0, 1) with SAD = 3. [2] Should the intra-frame or inter-frame coding scheme be used for this macroblock? Why? The macroblock size N = 2. N 1 1 [1] M Bmean = 4 i=0,j =0 Ci,j = (6 + 4 + 2 + 2)/4 = 3.5.
N 1 A= i =0,j =0 |Ci,j M Bmean | = |6 3.5| + |4 3.5| + |2 3.5| + |2 3.5| = 6. [2] SAD 2N = 3 2 2 = 1, [1] A > SAD 2N , so inter-frame coding. [1] What is the macroblock being coded after motion compensation? After motion compensation, the difference between the target macroblock and the best match in the reference will be used, i.e.

6 4 2 2

5 3 2 3

1 1 0 1 [2]

12 Marks: Unseen problem (d) Given the following coding schemes for a group of sequential frames in MPEG-2: I B B P B B B P I B B B P B P P 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 What is the coding order of the frames? The frame coding order is: I P B B P B B B I P B B B P B P 1 4 2 3 8 5 6 7 9 13 10 11 12 15 14 16 [6] 6 Marks: Unseen problem rst IPBBPBBB 2 marks, IPBBB 2 marks and nally PBP 2 marks Question 4 Total Marks 27

11X

END OF EXAMINATION

CM0340 Solutions CARDIFF UNIVERSITY EXAMINATION PAPER

Academic Year: Examination Period:

2011/2012 Autumn

Examination Paper Number: CM0340 Solutions Examination Paper Title: Duration: Multimedia 2 hours

Do not turn this page over until instructed to do so by the Senior Invigilator. Structure of Examination Paper: There are 12 pages. There are 4 questions in total. There are no appendices. The maximum mark for the examination paper is 81 and the mark obtainable for a question or part of a question is shown in brackets alongside the question. Students to be provided with: The following items of stationery are to be provided: ONE answer book. Instructions to Students: Answer 3 questions. The use of calculators is permitted in this examination. The use of translation dictionaries between English or Welsh and a foreign language bearing an appropriate departmental stamp is permitted in this examination.

PLEASE TURN OVER

CM0340 Solutions

Q1.

(a) What is MIDI? Denition: A protocol that enables computers, synthesizers, keyboards, and other musical devices to communicate with each other. [1] 1 Mark Bookwork

(b) What features of MIDI make it suitable for use in the MPEG-4 audio compression standard? MIDI is very low bandwidth when compared to audio. Sounds synthesised at client only control data transmitted [1] MIDI can control many performance aspects [1] 2 Marks Bookwork

(c) Briey outline the MPEG-4 structured audio standard. MPEG-4 Structured Audio Tools MPEG-4 comprises 6 Structured Audio tools: SAOL the Structured Audio Orchestra Language: SAOL is the central part of the Structured Audio toolset. It is a software-synthesis language denes how to make the sounds [1] SASL the Structured Audio Score Language: MIDI controls how SAOL makes the sounds [1] SASBF the Structured Audio Sample Bank Format: SASBF is a format for efciently transmitting banks of sound samples to be used in wavetable, or sampling, synthesis. [1] Set of MIDI semantics which describes how to control SAOL and SASL scripts (MIDI based). [1] Scheduler which describes how to take the above parts and create sound: It is a set of carefully dened and somewhat complicated instructions that specify how SAOL is used to create sound when it is driven by MIDI or SASL. [1] AudioBIFS part of BIFS, which lets you make audio soundtracks in MPEG-4 using a variety of tools and effects-processing techniques [1] 6 Marks Bookwork

CM0340 Solutions

(d) What features of MIDI make it suitable for controlling software or hardware devices? Basic syntax of midi is all about control playing notes, setting up sounds etc. [1] A wide number of specialised control messages some set like sustain, modulation, pitch bend others freely assignable. [1] Wide range of controllers available e.g. built in to keyboards, specialist hardware, software reassignable. [1] Midi System Real-time Messages for control of timing/syncing e.g. SMPTE, Midi Time Code. [1] System exclusive command set makes MIDI completely extensible to control any device. [1] External hardware to convert MIDI to/from other formats e.g. Pitch to Midi converters, Midi to Control Voltage (analogue synths), Motion Capture to MIDI!. [1] 6 Marks unseen. Assimilation of various aspects of bookwork (e) Outline how you could utilise MIDI in the following situations: A MIDI controller that has to be able to accommodate a range of values in excess of 255. Use two MIDI controllers in tandem. [1] One sets MSB and other LSB for the number - This allows 16-bit address space of values. [1] 2 Marks unseen. A MIDI sampler has limited memory but needs to be congured to be play trumpet sounds at one instant but also violin sounds at other instants. Use MIDI Sample Dump Standard (SDS) to transfer trumpet or violin samples as required. [2] 2 Marks unseen. A MIDI controlled Avatar where the avatars limbs and facial features are required to be controlled via MIDI. Assign a MIDI continuous controller for each action, e.g. Controller 1 for lower left leg, Controller 2 upper left leg, ....., Controller n mouth open/close ... [1] Send MIDI control data accordingly [1] 2 Marks unseen.

PLEASE TURN OVER

CM0340 Solutions

A MIDI controlled toy car where MIDI needs to control the starting and stopping of the car and also its speed. Use MIDI Midi System Real-time Messages (Sync) messages to start/stop car usually used to start/stop/continue recording sequences/devices. [2] Use MIDI tempo message to control car speed usually used to set tempo of a piece of MIDI music (in Beats per minute format) [1] 3 Marks unseen. A robot is to be controlled via MIDI. The robot has custom congurable circuitry that control its movement and sensors. The circuitry has to be initialised via MIDI. Set up a MIDI system exclusive (Sysex) message System dependent creation of messages. [2] Assume/congure circuitry to map parameter values from Sysex to drivers for circuits [1] 3 Marks unseen. 12 Marks total for part (e). 27 Marks Question Total

CM0340 Solutions

Q2.

(a) How does the human eye sense colour? What characteristics of the human visual system can be exploited for the compression of colour images and video? The eye is basically sensitive to colour and intensity Retina of the eye has neurons on which light is focus. Each neuron is either a rod or a cone. [1] Rods are not sensitive to colour - sense intensity (monochrome). [1] Cones come in 3 types: The rst responds most to light of long wavelengths, red/yellowish colours. The second type responds most to light of mediumwavelength, peaking at a green colour, The third type responds most to shortwavelength light, of a bluish colour. [1] Each responds differently Non linearly and not equally for RGB differently to various frequencies of light. [1] Compression in image video uses the fact that intensity (monochrome) can be modelled in high resolution and colour modelled in lower resolution and non-linearly w.r.t colour sensitivity. [1] 5 Marks - Bookwork (b) What is the YIQ color model ? How is compression achieved with YIQ in Analog NTSC Video and Digital MPEG Video? YIQ Colour Model YIQ is used primarily in colour TV broadcasting (although it is downward compatible with B/W TV.): Y (luminance) is the CIE Y primary. [1] I is red-orange axis, Q is roughly orthogonal to I. [1] Eye is most sensitive to Y, next to I, next to Q. [1] In NTSC analog 4 MHz bandwidth is allocated to Y, 1.5 MHz to I, 0.6 MHz to Q. [1] In digital video, Chroma subsampling ratio: 4:1:1 [1] 5 Marks - Bookwork

PLEASE TURN OVER

CM0340 Solutions

(c) What is a colour look-up table and how is it used to represent colour? Colour Look-Up Tables (LUTs) Store only the index of the colour LUT for each pixel. Look up the table to nd the colour (RGB) for the index [1] [1]

[3] 5 Marks - Bookwork Give an advantage and a disadvantage of this representation with respect to true colour (24-bit) colour. Advantage : Use up signicantly less memory than full 24-bit colour. Disdvantage : Restricted number of colours available. 2 Marks - Bookwork [1] [1]

How do you convert from 24-bit colour to an 8-bit colour look up table representation? LUT needs to be built when converting 24-bit colour images to 8-bit: grouping similar colours (each group assigned a colour entry) [1] 1 Mark - Bookwork

CM0340 Solutions

(d) Describe how colour look-up tables can be used to implement simple computer animations. Illustrate you answer with the following example: In an 7x7 image you have to animate a 3x3 red square moving from left to right at a rate of 2 pixels per frame. The square is centred vertically within the image and the image background is black. Solution Sketch: Need to construct a image indices to LUT and Manipulate LUT values rather than image values. [1] Need to set up the 8x8 image as follows: Rows 1,2,6,7 always black so LUT value 0 set to (0,0,0) in LUT [1] Rows 35 Need to have indices that take on black (0,0,0) or red (255,0,0) as animation proceeds. [1] there is a 2 pixel step at each from so there is an overlap of 1 column of the block at each iteration. Need another LUT entry to accommodate this in the animation. [1] 0 0 1 1 1 0 0 0 0 1 1 1 0 0 0 0 2 2 2 0 0 0 0 3 3 3 0 0 0 0 4 4 4 0 0 0 0 5 5 5 0 0 0 0 6 6 6 0 0

Animation proceeds as follows: Step 1: Set 1 and 2 to red 3- black [1] Step 2: Set 1 to black and 2, 3 and 4 to red (rest remain black) [1] Step 3: Set 2 and 3 to black and 5 and 6 to red (4 stays red, rest remain black) [1] similarly for rest of animation. [1] 8 Marks - Unseen Give a limitation of colour look-up table animation Limitation: Even greater limitation of colour values than normal LUTs as several values need to reserved to display the animated object/background colours [1] or Need fast LUT memory [1] 1 Mark - Unseen 27 Marks Question Total

PLEASE TURN OVER

CM0340 Solutions

Q3.

(a) GIF and JPEG are two commonly used image representations. What images are suitable to be represented as GIF and JPEG? Do they usually use lossless or lossy compression?State the major compression algorithm (for lossless) or the lossy steps of the algorithm (for lossy) for each. Lossless or lossy: GIF : Lossless. JPEG : Lossy. Key algorithms: GIF : Key algorithm is LZW (lossless) JPEG : Lossy steps involve quantisation and chroma subsampling 4 Marks Bookwork (b) Briey describe the four basic types of data redundancy that data compression algorithms can apply to audio, image and video signals. 4 Types of Compression: Temporal in 1D data, 1D signals (Audio), 3D temporal frames in Video. [2] Spatial correlation between neighbouring pixels or data items. [2] Spectral correlation between colour or luminescence components. This uses the frequency domain to exploit relationships between frequency of change in data. [2] Psycho-visual, psycho-acoustic exploit perceptual properties of the human visual system or aural system to compress data. [2] 8 Marks Bookwork [1] [1] [1] [1]

CM0340 Solutions

(c) Encode the following steam of characters using decimal arithmetic coding compression: MULTI You may assume that characters occur with probabilities of M = 0.1, U = 0.3, L = 0.3, T = 0.2 and I = 0.1. Sort Data into largest probabilities rst and make cumulative probabilities 0 U 0.3 L 0.6 T 0.8 M - 0.9 I 1.0 There are only 5 Characters so there are 5 segments of width determined by the probability of the related character. [3] The rst character to encoded is M which is in the range 0.8 0.9, therefore the range of the nal codeword is in the range 0.8 to 0.89999.. [1] Each subsequent character subdivides the range 0.8 0.9 [1] SO after coding M we get 0.8 U 0.83 L 0.86 T 0.88 M 0.89 I 0.9 [1] So to code U we get range 0.8 0.83 So we subdivide this range 0 U 0.809 L 0.818 T 0.824 M 0.827 I 0.83 [1] Next range is for L so we split in the range 0.809 0.818 0.809 U 0.8117 L 0.8144 T 0.8162 M 0.8171 I 0.818 [1] Next Character is T so range is from 0.8144 0.8162 so we get 0.8144 U 0.81494 L 0.81548 T 0.81584 M 0.81602 I 0.8162 [1] Final Char is I which is in the range 0.81602 0.8162 [1] So the completed codeword is any number in the range 0.81602 <= codeword < 0.8162 [2] 12 Marks Unseen (d) Show how your solution to (c) would be decoded. Assume Codeword is 0.8161 Sort Data into largest probabilities rst and make cumulative probabilities [1] Code can determine rst character is M since it is in the Range 0.8 0.9 [1] By expanding interval we can see that next char must be an E as it is in the range 0.8 0.83 and so on for all other intervals. [1] 3 Marks Unseen 27 Marks Question Total

PLEASE TURN OVER

CM0340 Solutions

Q4.

(a) List two psychological phenomena that are exploited in MPEG audio compression. These involve frequency masking and temporal masking. Briey explain their meanings. Frequency Masking : When an audio signal consists of multiple frequencies the sensitivity of the ear changes with the relative amplitude of the signals. If the frequencies are close and the amplitude of one is less than the other close frequency then the second frequency may not be heard. [1] Temporal Masking : After the ear hears a loud sound, consisting of multiple frequencies, it takes a further short while before it can hear a quieter sound close in frequency. [1] 4 Marks: Bookwork (b) How does MPEG audio compression implement methods which use the above psychological phenomena? Both use the following basic process (MPEG employs Frequency Masking and may employ temporal masking (MP3 Level 3). ) Uses Fourier Transform (FFT) to perform analysis. [1] Spilt signal into frequency sub-bands [1] [2] Frequency Masking : Determine amount of masking for each band caused by nearby bands with a lookup table of scaling ratios: Input: set hearing thresholds and sub-band masking properties (model dependent) and scaling factors (above). [3] Temporal Masking : Same sub-band coding technique as Frequency Masking used. Two windows of sample length 36 and 12 with 50% overlap used to encode temporal aspects Three short 12 sample blocks concatenated to make a 36 sample length block [3] 8 Marks: Rationalisation of Bookwork [2]

10

CM0340 Solutions

(c) What are the fundamental differences between MPEG and Dolby audio compression algorithms? MPEG/Dolby algorithm difference: MPEG perceptual coders control quantisation accuracy of each sub-band by computing bit numbers for each sample. MPEG needs to store each quantise value with each sample. MPEG Decoder uses this information to dequantise: forward adaptive bit allocation DOLBY: Use xed bit rate allocation for each subband based on characteristics of the ear. [4] Give an advantage and a disadvantage of Dolby with respect to MPEG audio compression. Dolby Advantage: Uses xed bit rate allocation so no need to send such info with each frame. [1] Dolby Disdvantage: DOLBY encoders and decoder both need psychoacoustic model information or Decoder not independent of encoder method. [1] 6 Marks: Bookwork (d) Given two stereo channels of audio: Left Channel: Right Channel: 112 102 112 114 113 116 114 104 115 124 127 120 136 122 144 133

i. Apply Middle/Side (MS) stereo coding to the sequence. Basic Idea: Middle sum of left and right channels Side difference of left and right channels. Middle: Side: 224 0 216 -12 229 -3 218 10 239 -9 247 7 258 14 277 11 [3]

3 Marks Applied Unseen Problem

11

PLEASE TURN OVER

CM0340 Solutions

ii. How may this result be employed to achieve compression? Encode side in less bits as its essentially Differential Pulse Code Modulation. [1] Use specially tuned threshold values to compress the side channel signal further. [1] Code Middle in normal (for audio) 16 bits [1] Code Side in reduced number of bits. Needs to be signed so in the above 5 bits (minimum) needs (usually have an 8 bit range) [1] 4 Marks Unseen (Briey discussed part in notes needs to applied to) iii. Give two potential problems of this coding method. Two Audio signals are 16-bit so possible overow in Middle (e.g. above if numbers we 8-bit last would two value overow) Solution: Could be to average signals rather than sum. [1] Need signed 8-bit value for Side so only 7 bit range for difference - so either quantise or overow error possible without care (special tuned threshold in MPEG). [1] 2 Marks Unseen 27 Marks Question Total

12X

END OF EXAMINATION

Vous aimerez peut-être aussi