Vous êtes sur la page 1sur 38

Reusing Windows Vista Audio System Effects

September 15, 2006

Abstract Microsoft Windows Vista includes several audio rendering digital signal processing effects that are installed along with the in-box class drivers. These effects are packaged as user-mode System Effect Audio Processing Objects (sAPOs). The effects include per-stream pre-mix render sAPOs (local effects [LFX]) and one post-mix render sAPO (global effects [GFX]). This white paper describes the render LFX and GFX sAPOs that are included with Windows Vista. It also describes two strategies that hardware manufacturers can implement to reuse the inbox sAPOs. This information applies to the Windows Vista operating system. Future versions of this preview information will be provided in the Microsoft Windows Driver Kit (WDK). The current version of this paper is maintained on the Web at: http://www.microsoft.com/whdc/device/audio/vista_sysfx.mspx References and resources discussed here are listed at the end of this paper. Contents
Introduction .............................................................................................................................3 New Audio Features for Windows Vista ..................................................................................3 Loudness Equalization DSP (LFX).....................................................................................3 Bass Management (LFX) ...................................................................................................5 Low Frequency Protection (LFX)........................................................................................8 Speaker Fill (LFX) ..............................................................................................................8 Room Correction (GFX) .....................................................................................................9 Virtual Surround (LFX) .....................................................................................................10 Speaker Phantoming (LFX)..............................................................................................10 Enhanced Sound for Laptop Computers ..........................................................................11 Audio System Effects User Interface................................................................................13 Reuse of Microsoft sAPOs by Third Parties ..........................................................................16 How to Install HD Audio and USB Audio Drivers..............................................................16 How to Combine Custom and Windows Vista sAPOs......................................................17 Detailed Guidelines for Strategy A ........................................................................................17 General Programming Issues...........................................................................................17 Features and Their Modes ...............................................................................................19 Supported IPropertyStore Properties ...............................................................................26 Mutual Exclusion and Feature Interactions ......................................................................29 Sample Code for Strategy A.............................................................................................29 Detailed Guidelines for Strategy B ........................................................................................32 Programming Information.................................................................................................32 Initialization ......................................................................................................................32 Query Windows Vista sAPO's Feature State...................................................................33 Format Negotiation...........................................................................................................33 LockForProcess/UnlockForProcess .................................................................................34 GetLatency.......................................................................................................................34 APOProcess.....................................................................................................................35 Handling Windows Vista sAPO errors..............................................................................35

Reusing Windows Vista Audio System Effects - 2

Compilation and Linking ...................................................................................................35 General Guidelines for Custom Audio System Effects..........................................................36 Resources.............................................................................................................................36 Appendix. Run-Time Considerations When Reusing Windows Vista sAPOs .......................37 Handling the Limitations of Different Input-Output Format Combination ..........................37 Interaction between Speaker Fill and Bass Management ................................................38 Interaction between Folddown and Bass Management....................................................38

Disclaimer
This is a preliminary document and may be changed substantially prior to final commercial release of the software described herein. The information contained in this document represents the current view of Microsoft Corporation on the issues discussed as of the date of publication. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information presented after the date of publication. This White Paper is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS DOCUMENT. Complying with all applicable copyright laws is the responsibility of the user. Without limiting the rights under copyright, no part of this document may be reproduced, stored in or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of Microsoft Corporation. Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this document. Except as expressly provided in any written license agreement from Microsoft, the furnishing of this document does not give you any license to these patents, trademarks, copyrights, or other intellectual property. Unless otherwise noted, the example companies, organizations, products, domain names, e-mail addresses, logos, people, places and events depicted herein are fictitious, and no association with any real company, organization, product, domain name, email address, logo, person, place or event is intended or should be inferred. 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, and Windows Vista are either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries. The names of actual companies and products mentioned herein may be the trademarks of their respective owners.

2006 Microsoft Corporation. All rights reserved.

Reusing Windows Vista Audio System Effects - 3

Introduction
Today's consumers use their PCs to access and enjoy a wide variety of entertainment. It is not uncommon to have music, television, and movie playback integrated into one computer. With Microsoft Windows Media Center Edition, computers are increasingly found in the living room and are used as the entertainment hub for the entire household. New audio features enable users to store their media in one placethe computerinstead of spread out over several sources such as a CD or DVD player, an audio/video receiver (AVR), a TV, and so on. Centralizing media sources provides a much more engrossing playback experience for both the casual and the avid listener. Microsoft Windows Vista introduces new advanced audio and communication functionality that enhances the high-fidelity music and movie audio experience and provides great hands-free voice support. It supports the kind of entertainment audio functionality and performance that is usually found only in expensive feature-laden AVRs. This includes previously exclusive features such as room correction and bass management. For communications audio, Window Vista supports echo cancellation and microphone array voice acquisition. This new inbox audio functionality for both entertainment and communications services is well ahead of the competition. The new audio features can be broadly classified into three categories: Enhanced audio playback for music, television, and movies Surround headphones and bass boost for laptop computers Advanced communication support

This white paper provides an overview of the audio system effects in Windows Vista. It includes descriptions of the underlying digital signal processing (DSP) algorithms and the user interface (UI) for accessing and choosing the available audio system effects.

New Audio Features for Windows Vista


The following sections describe the new audio features in Windows Vista. Some are implemented as local effects (LFX) System Effect Audio Processing Objects (sAPO)s and others as global effects (GFX) sAPOs. The type of sAPO is indicated in the section heading.

Loudness Equalization DSP (LFX)


One of the frustrating aspects of current integrated media experiences is that volume levels between different sources are often not consistent. For example, even though a TV program might be at just the right volume level, commercial breaks can vary widely in volume. This requires users to adjust the volume setting accordingly. Some of today's expensive HD-capable TVs can equalize volume so that the sound stays at a somewhat constant level. That works well if you rely on your TV for sound playback, but most home theater and home audio enthusiasts connect the TV audio directly to their sound systems. In addition, today's loudness equalization solutions are often not effective for different audio content and sources.

2006 Microsoft Corporation. All rights reserved.

Reusing Windows Vista Audio System Effects - 4

Windows Vista can maintain a more uniform perceived loudness across different digital audio files or sources than earlier technologies. This means that loudness always stays within a specified rangeeven for different digital signalswhen users:: Switch between a broadcast NTSC or ATSC TV program and a locally stored Windows Media Audio (WMA) or MP3 file. Switch between different formats in a play listsuch as WMA, MP3, or WAV filesthat were authored at different volume levels.

Loudness equalization is ideal, for example, for watching a movie at night. It makes it easier to hear the quieter parts of the movie while limiting the maximum loudness to a level that is considerate of others. Loudness equalization also improves the listening experience in noisy playback settings, by making the quiet parts of the content loud enough to be audible without creating disturbingly loud peak volumes. Loudness and intensity are different ways of quantifying audio levels. Loudness, in its technical sense, refers to the listener's perception of an audio signal's volume. Intensity (volume and level) is the externally measured power of an audio signal.

Two signals of the same intensity with different time structure or frequency content can have substantially different loudness levels. This leads to the common experience where some content sounds much louder than other content with the same intensity simply because of differences in the source material and the way in which the content was recorded. Furthermore, different content standardssuch as digital versus analog TV)could have different intensity levels for the same content. As a result, the perceived level of audio content can vary widely, from nearly inaudible in a moderately quiet listening environment to loud enough to be uncomfortable. Loudness equalization simulates human hearing to accurately measure the loudnessas opposed to intensityof an audio source. It then uses dynamic gain adjustment to keep the loudness of different sources more nearly constant. Loudness equalization can thus affect both dynamic range and peak loudness. Windows Vista uses single-pass loudness equalization, which calculates loudness on a block-by-block basis. A block corresponds to the critical band resolution of a human year. Single-pass loudness equalization adjusts the gain with a fast attack and slow decayjust as many wideband compressors doto tightly control the peak loudness of a signal while maintaining the local dynamics. Fast attack means that relatively loud signals have their gain rapidly reduced to control the loudest signal that is presented to the listener. Slow decay means that, when an audio signal reaches a peak but does not sustain that level, the gain following the peak is slowly increased.

Single-pass loudness equalization equalizes long-term level changes somewhat, but preserves the signal's short-term dynamics. The loudness equalization is not full, and the technique deliberately preserves some sense of louder versus softer across different material.

2006 Microsoft Corporation. All rights reserved.

Reusing Windows Vista Audio System Effects - 5

A loudness equalization service is very rare in AVRs and is not generally included in competing products. Figure 1 illustrates the Enhancements tab of the Windows Vista Control Panel Speakers application, showing the UI for enabling and configuring loudness equalization.

Figure 1. The loudness equalization option

Bass Management (LFX)


No home theater would be complete without bass that can be felt. With Windows Vista, users can adjust the movie or music playback experience to maximize the bass effects on the loudspeakers in a home theater system. In many audio systems, some or all of the loudspeakers have a limited frequency range. In such systems, a single subwoofer is often used for frequencies below the capabilities of the main loudspeakers. Although a system with one subwoofer might not maintain all of the auditory cues in the original source material, such systems are very common. Often they must prefilter the signals to all channels to maximize the bass response.

2006 Microsoft Corporation. All rights reserved.

Reusing Windows Vista Audio System Effects - 6

Forward Bass Management


One way to handle this prefiltering is called forward bass management. With this approach, some or all of the loudspeakers are small, but the system can have large left and right loudspeakers, a subwoofer, or both. The low-frequency portion of the audio signal for the small speakers is redirected to the large speakers or subwoofer, which can better handle that frequency range. Figure 2 shows an example of forward bass management for a system with a subwoofer.
FORWARD BASS MANAGEMENT All speakers small Bass from all channels directed to subwoofer

LPF

HPF

BASS

LPF

HPF

BASS

LPF

HPF

BASS

LPF

SL

HPF

BASS

SL

LPF

SR

HPF

BASS

SR

LFE Subwoofer

Figure 2. Forward bass management

Reverse Bass Management


People who prefer full-range loudspeakers in some or all positions of their home theater system might not haveor wanta subwoofer. In such cases, the signal that is intended for the subwoofer might have to be mapped back into the main channels so that it is not lost. This bass management scheme is referred to as reverse bass management. With this approach, there are at least two large loudspeakers and no subwoofer. If there are more than two large loudspeakers, reverse bass management can be performed with only two or all large loudspeakers. The signal that is intended for the subwoofer is distributed appropriately in either case.

2006 Microsoft Corporation. All rights reserved.

Reusing Windows Vista Audio System Effects - 7

Figure 3 shows a typical example of reverse bass management. The subwoofer's signal and the low-frequency portion of the signal that is intended for the small loudspeakers are redirected to the two large loudspeakers.
REVERSE BASS MANAGEMENT Front LR speakers large (full range) Satellite speakers small No subwoofer LFE and bass from satellite speakers channeled to front LR

L L

R
LPF

HPF

BASS

LPF

SL

HPF

BASS

SL

LPF

SR

HPF

BASS

SR

LFE

Figure 3: Reverse bass management

2006 Microsoft Corporation. All rights reserved.

Reusing Windows Vista Audio System Effects - 8

With either type of system, a user can use the Bass Management Settings dialog box during setup to specify the loudspeaker configuration and the cutoff frequency of the limited-bandwidth loudspeakers. The system can then take the appropriate actions. Figure 4 shows the Bass Management Settings dialog box.

Figure 4. The Bass Management Settings dialog box

Low Frequency Protection (LFX)


Low-frequency protection is a form of forward bass management. It is used when there is no subwoofer and all the speakers are small and lack bass capability. The system simply removes the part of the audio signal that falls below a specified cutoff frequency.

Speaker Fill (LFX)


Most music is produced with only two channels and is thus is not optimized for the typical audio or video enthusiast's multichannel audio equipment. For those who have invested in a multichannel system, having music emanate from only the frontleft and front-right loudspeakers is a less-than-ideal audio experience. Speaker fill simulates a multichannel loudspeaker setup. It allows music that would otherwise be heard on only two speakers to be played on all of the loudspeakers in the room, enhancing the spatial sensation. Speaker fill is used when there are more playback channels or loudspeakers than there are source channels. The effect is generated by a combination of channel manipulation and inserted delays. Speaker fill accepts stereo or multichannel input. Speaker fill is sometimes used when there are equal numbers of source and playback channels. This situation occurs when content is authored for a channel mask with a smaller number of channels than the physical configuration's channel mask. One example would be content with a quadraphonic channel mask that is played on a surround sound system.

2006 Microsoft Corporation. All rights reserved.

Reusing Windows Vista Audio System Effects - 9

Speaker fill can be turned on and off with the Control Panel Audio application's Speakers (High Definition Audio Device) dialog box, as shown in Figure 5.

Figure 5. Enabling speaker fill

Room Correction (GFX)


Dialog and sound effects should sound as impressive as possible in a home theater. However, getting the audio configuration in a home theater just right can take a substantial amount of time. To aid this process, Windows Vista features room correction processing. This processing optimizes the listening experience for a particular location in the roomfor example, the center cushion of your couch by automatically calculating the optimal combination of delay, frequency response, and gain adjustments. Room correction better matches sound to the on-screen image and is also useful for users who place their desktop speakers in nonstandard locations. Windows Vista room correction processing is an improvement over similar features in high-end receivers because it better accounts for the way in which the human ear processes sound. Room correction is calibrated with a microphone. The procedure can be used with both stereo and multichannel systems. The user places the microphone where the user intends to sit and then activates a wizard that measures the room response. The wizard plays a set of specially designed tones from each loudspeaker in turn to measure the distance, frequency response, and overall gain of each loudspeaker from the microphone's location. If the user has a good microphone, the calibration

2006 Microsoft Corporation. All rights reserved.

Reusing Windows Vista Audio System Effects - 10

procedure automatically attempts to flatten the frequency response of each channel to compensate for relative differences in the channels, as well as any deficiencies in each channel's frequency response. After these measurements have been made, they are stored as a profile that is used by the room correction DSP to correct the delay, overall gain, and frequency balance between loudspeaker locations. Room correction ensures that the listening area will be a good stereo and multichannel soundstage with improved timbre, envelopment, and front and back sensation when compared to the uncorrected system. Figure 6 shows the completion page of the Room Calibration Wizard.

Figure 6: The Room Calibration Wizard

Virtual Surround (LFX)


Virtual surround uses simple digital methods to combine a multichannel signal into two channels in a way that allows it to be restored to a multichannel signal that uses the Pro Logic decoders that are available in most modern audio receivers. Virtual surround is ideal for a system with a two-channel sound card and a receiver with a surround sound enhancement mechanism.

Speaker Phantoming (LFX)


Usually, all the loudspeakers in a multichannel systemincluding the center and satellite loudspeakersare always present. However, users might not have all the expected loudspeakers or they might choose to selectively turn one or more of them off. A common example is a multichannel system that lacks a center loudspeaker. Speaker phantoming reproduces the sound from the missing loudspeaker, typically by splitting it between the adjacent loudspeakers. However, phantoming can also

2006 Microsoft Corporation. All rights reserved.

Reusing Windows Vista Audio System Effects - 11

use other combinations such as the rear-left and rear-right or side-left and side-right speakers.

Enhanced Sound for Laptop Computers


Increasing numbers of people watch movies and television shows on their laptop computers. With Windows Vista, users can take their home theater experience with them while they are away from home.

Virtualized Surround Sound over Headphones (LFX)


Virtualized surround sound over headphones takes movie playback experience on a laptop to the next level. Virtualized surround sound allows users who are wearing headphones to distinguish sound from front to back as well as from side to side by transmitting spatial cues that help the brain localize the sounds and integrate them into a sound field. The effect feels like it transcends the headphones, creating an "outside-the-head" listening experience. Conventional headphone playback does not provide the spatial cues that a listener would normally experience with playback over loudspeakers. The result is an unnaturally wide stereo image that forms a straight line between the user's ears. Left and right sounds appear to occur directly beside the listener, whereas center sounds appear to be within the listener's head. Headphone virtualization creates a virtualized surround sound experience through stereo headphones by using an advanced technology called Head Related Transfer Functions (HRTF). HRTF generates acoustic cues that are based on the shape of the human head. These cues not only help listeners to locate the direction and source of sound but also the type of acoustic environment that is surrounding the listener. HRTF are measurable characteristics that account for the near-ear response, farear response, and interaural delay (the delay between the two ears in perceiving a sound). These characteristics are synthesized with digital signal processing and delivered to headphones. The brain then interprets the three-dimensional spatial cues to re-create an exceptional listening experience.

2006 Microsoft Corporation. All rights reserved.

Reusing Windows Vista Audio System Effects - 12

Users enable virtualized surround sound by indicating that they are listening with headphones. Figure 7 shows the UI for enabling virtualized surround sound.

Figure 7. Enabling virtualized surround sound

Bass Boost (LFX)


In systems such as laptops that have speakers with limited bass capability, it is sometimes possible to increase the perceived quality of the audio by boosting the bass response in the frequency range that is supported by the speaker. Bass boost improves sound on mobile devices with very small speakers by increasing gain in the mid-bass range.

2006 Microsoft Corporation. All rights reserved.

Reusing Windows Vista Audio System Effects - 13

Audio System Effects User Interface


Audio system effects are exposed to users through the Control Panel Audio application. The Playback tab allows users to set up the physical loudspeaker configuration and specify whether the audio end points are loudspeakers, headphones, or a Sony/Philips Digital Interface (S/PDIF) receiver. Figure 8 shows the Playback tab on the Control Panel Audio application.

Figure 8. Screenshot of the Playback tab

2006 Microsoft Corporation. All rights reserved.

Reusing Windows Vista Audio System Effects - 14

After users choose an audio end point, they must run a Speaker Configuration Wizard. The wizard asks users to specify their loudspeaker configurationstereo or multichanneland verifies that the configuration is accurate by playing test tones through the loudspeakers. In addition, users can specify the physical characteristics of their loudspeakers, such as which loudspeakers are full range, whether any loudspeakers are absent, and so on. Figure 9 shows the Speaker Configuration Wizard.

Figure 9. The Speaker Configuration Wizard

2006 Microsoft Corporation. All rights reserved.

Reusing Windows Vista Audio System Effects - 15

The loudspeaker configuration settings determine which audio system effects are enabled. The configuration settings also determine some of the parameters that are required for audio system effects such as bass management and speaker phantoming. After users complete the Speaker Configuration Wizard, they can use the Enhancements tab on the Control Panel Audio application's Speakers Properties dialog box to select audio system effects that are pertinent to their loudspeaker configuration. Figure 10 shows the Enhancements tab of the Speakers Properties dialog box.

Figure 10. The Speaker Properties dialog box, Enhancements tab

The available audio system effects depend on which audio end point, loudspeaker configuration, and loudspeaker characteristics the user has chosen. Table 1 lists the audio system effects that are available for the various loudspeaker configurations.

2006 Microsoft Corporation. All rights reserved.

Reusing Windows Vista Audio System Effects - 16

Table 1. Audio System Effects and Loudspeaker Configuration


Stereo speakers Base management Base boost Speaker phantoming Speaker fill Virtual surround Headphone virtualization Loudness equalization Room correction Yes Yes No No Yes No Yes Yes 5.1 speakers Yes No Yes Yes No No Yes Yes 7.1 speakers Yes No Yes Yes No No Yes Yes Headphones No Yes No No No Yes Yes No S/PDIF (PCM) Yes No No No Yes No Yes Yes LineOut Yes No No No Yes No Yes Yes

Reuse of Microsoft sAPOs by Third Parties


As discussed earlier, audio system effects are implemented as sAPOs. Independent hardware vendors (IHVs) who are developing their own drivers for HD Audio and USB audio must implement all the audio system effects functionality that is provided by the in-box sAPOs. There are three ways to comply with this requirement: Option 1: Reuse the features of Windows Vista by delegating all required audio system effects to the Microsoft sAPOs. The IHV does not provide any custom audio system effects. Option 2: Reuse some or all of the features of Windows Vista. Custom sAPOs that support some of the required functionality delegate any missing functionality to the Microsoft sAPOs. Custom sAPOs that support functionality that falls outside of the requirements implement the extra functionality and delegate all of the required functionality to the Microsoft sAPOs. Option 3: Implement a complete set of custom sAPOs that support all the required features.

How to Install HD Audio and USB Audio Drivers


The installation procedure for an HD Audio or USB audio driver in Windows Vista depends on whether custom audio system effects sAPOs are also being installed. The procedure depends on which of the three options is chosen: Option 1: Custom Audio Drivers with No Custom Audio System Effects Support. IHVs or original equipment manufacturers (OEMs) create their own audio drivers and associated INF files but do not implement any custom audio system effects sAPOs of their own. They must install the Windows Vista sAPOs by modifying their INF file to call the standard INF file. Option 2: Custom Audio Drivers with a Partial Custom Audio System Effects Support. IHVs or OEMs create their own audio drivers and associated INF files and also implement one or more custom audio system effects sAPOs. However, the sAPOs do not completely support the required audio system effects functionality. IHVs or OEMs must delegate any missing functionality to the appropriate Windows Vista sAPO by wrapping it in their own sAPO. Option 3: Custom Audio Drivers with Complete Custom Audio System Effects Support. IHVs or OEMs create their own audio drivers and associated INF files including a complete set of custom audio system effects sAPOs. Because the custom audio system effects sAPOs support or exceed the

2006 Microsoft Corporation. All rights reserved.

Reusing Windows Vista Audio System Effects - 17

required audio effects functionality, there is no need to install any Windows Vista sAPOs. The details in the following sections pertain specifically to Option 2, where the IHVs and OEMs must delegate missing functionalities to Windows Vista sAPOs.

How to Combine Custom and Windows Vista sAPOs


IHVs who have chosen Option 2 can implement custom audio system effects sAPOs to replace either or both of the Windows Vista LFX and GFX custom audio system effects sAPOs. Broadly speaking, IHVs or OEMs have two basic strategies for combining custom audio system effects sAPOs with the sAPOs that Windows Vista provides. These strategies give the IHVs flexibility on how they integrate their custom effects with those of Windows Vista. The details of how to implement both strategies are discussed in the following sections of this paper. Strategy A Develop a detailed understanding of the Windows Vista sAPO that you want to replace and its features. Use that understanding to implement a custom sAPO that calls the Windows Vista sAPO in a way that makes the most sense to the IHV from the perspective of their target user experience. This strategy is best suited to IHVs or OEMs who want to: Seamlessly integrate their custom effects with the Windows Vista effects. Implement their own UI to control their effects and the effects implemented by the Windows Vista sAPOs.

Strategy B Write the custom sAPO as a thin wrapper around the Windows Vista sAPO. This strategy is best suited to IHVs or OEMs who want to: Add their custom effects in the simplest way possible. Have the Windows Vista UI continue to control the effects.

IHV or OEMs who choose Strategy B should still read the Strategy A section to obtain a thorough understanding of Windows Vista custom audio system effects. Note: With strategy B, IHVs cannot add UI to control their added custom audio system effects to the Windows Vista Enhancements tab. There is only one Enhancements tab, and it must remain associated with the property page for the Windows Vista sAPOs. The IHV's UI must be implemented in some other way, such as a separate Control Panel application.

Detailed Guidelines for Strategy A


This section provides detailed guidelines for using Strategy A to implement custom audio system effects sAPOs. The first section provides general programming information, and the second provides detailed discussions of how to implement the various audio effects.

General Programming Issues


This section covers the general programming issues that must be addressed when using Strategy A to implement custom sAPOs. Both LFX and GFX custom audio system effects sAPOs have the following general characteristics: They must be registered as COM in-process server objects that can be instantiated by using CoCreateInstance.

2006 Microsoft Corporation. All rights reserved.

Reusing Windows Vista Audio System Effects - 18

The CLSIDs are CLSID_CWMAudioLFXAPO and CLSID_CWMAudioGFXAPO for the LFX and GFX sAPOs, respectively. The CLSIDs are declared in wmcodecdsp.h and defined in wmcodecdspuuid.lib. They must support COM aggregation. However, aggregation is not expected to be used in custom audio system effects scenarios, so it should pose no significant problems.

Initialization
A custom sAPO must initialize the Window Vista sAPO by calling its IAudioSystemEffects::Initialize method. This is typically done from the custom APOs Initialize method. Any arguments that are passed to the custom sAPO's Initialize method should be passed directly to the Windows sAPOs Initialize. This allows the Windows Vista sAPO to fetch its settings from the endpoint and Fx property stores in the APOInitSystemEffects structure. It is possible to have the custom sAPO fetch the settings and selectively pass them to the Windows Vista sAPO, but that is essentially Strategy A. If the custom sAPO replaces a Windows Vista feature, it is generally advisable to turn off the corresponding feature on the Windows Vista sAPO. However, turning off the Windows Vista feature might not be strictly necessary, depending on how the feature works. To turn off a feature, query the Windows Vista sAPO for its IPropertyStore interface and call IPropertyStore::SetValue. The properties that are supported by the Windows Vista sAPO's property store are described in "Supported IPropertyStore Properties." later in this paper.

Processing
Generally, just call IAudioProcessingObjectRT::APOProcess with an arbitrary number of input samples and the sAPO will produce the same number of output samples. However, make sure that the output buffer is large enough to hold that many output samples. Both sAPOs support in-place processing, so the input and output buffers can be the same. However, the buffer must be large enough to contain the same number of output samples that were sent as input, allowing for possible expansion of the input sample by features like speaker fill. When headphone virtualization is enabled, the Windows Vista sAPO currently works correctly only if the number of input samples that are passed to it each time is a multiple of 2048. This restriction should eventually be relaxed by using one of the following methods. The appropriate method will be chosen based on the audio engine's requirements. Remove the restriction. The sAPO accepts any number of input samples and produces the same number of output samples. Remove the restriction, but remember that the number of output samples might be different from the number of input samples. In that case, the caller must use IAudioProcessingObjectRT::CalcInputFrames or IAudioProcessingObjectRT::CalcOutputFrames to predict the number of samples that the sAPO will produce and calculate the required output buffer size. Maintain the restriction, but advertise it through an sAPO interface.

Note: The Windows Vista sAPO IAudioProcessingObject::GetLatency implementation is not currently functional.

2006 Microsoft Corporation. All rights reserved.

Reusing Windows Vista Audio System Effects - 19

API Documentation
The custom audio system effects sAPOs currently have no custom APIs. Complete COM, sAPO, and IPropertyStore documentation is beyond the scope of this paper. For complete COM documentation, see the MSDN library. The sAPO interfaces are documented in audioenginebaseapo.idl, which is included in the Platform SDK.

Features and Their Modes


The following sections discuss the implementation of a number of Windows Vista audio system effects sAPOs. The text in the parenthesis indicates whether the effect is implemented by the LFX or the GFX sAPO.

Bass Management (LFX sAPO)


As discussed in "Bass Management (LFX)", there are two bass management modes: Forward bass management filters the low frequency part of the signal out of the affected main channels. It redirects the filtered output to the subwoofer or the front left and right loudspeaker channels, depending on which channels can handle deep bass frequencies. This decision is based on the LRBig flag. This flag is set during speaker configuration if the user indicates that the front-right and front-left speakers are full range. Reverse bass management distributes the signal from the subwoofer channel to the other output channels. The signal is directed either to all channels or to the left and right front channels, depending on the LRBig flag. The process uses a substantial gain when mixing the subwoofer signal into the other channels. As long as reverse bass management is possible for the current output format, the sAPO that handles it should always scale gain on the main channels down by a factor of (1.0 + the subwoofer gain) to avoid overloading the channel. This should be done regardless of whether reverse bass management is currently enabled and whether the content that is currently playing has a subwoofer channel.

The bass management mode that is used depends on the availability of a subwoofer and the bass-handling capability of main speakers. Table 2 summarizes which bass management mode applies in various scenarios. The six scenarios are numbered for later reference. FBM and RBM refer to forward and reverse bass management, respectively. Table 2. Bass Management Modes
Main speakers All speakers are small The front left/right speakers are large All speakers are large Subwoofer is present (inverted or noninverted) FBM (Scenario 1) FBM (Scenario 2) N/A (Scenario 3) No subwoofer Low-frequency protection or bass boost (Scenario 4) RBM and FBM (Scenario 5) RBM (Scenario 6)

In all six scenarios, the user has at least the following choices: Turn off bass management completely. Turn on bass management, which causes the sAPO to automatically decide the appropriate bass management mode.

2006 Microsoft Corporation. All rights reserved.

Reusing Windows Vista Audio System Effects - 20

The following list is a case-by-case description of the six scenarios: Scenario 1: Forward bass management. The low-frequency portion of the signal for the speaker channels is redirected to the subwoofer. Scenario 2: Forward bass management. The low-frequency portion of the signal for the speaker channels is redirected as follows: If the original channel is off center, the low-frequency signal is redirected to the front-left or front-right channel, depending on which of those two channels is on the same side as the original channel. If the original channel is on the center axis, the low-frequency signal is redirected to the subwoofer channel.

Scenario 3: No bass management. Scenario 4: Low-frequency protection. The low-frequency portion of each of the main channels is removed. The user can choose to turn on bass boost instead of low-frequency protection. Scenario 5: Both bass management modes applied. There is no way to enable them separately. Forward bass management. The low-frequency portion of each of the surround channels is redirected to the front-left or front-right channels, depending on which of those two channels is on the same side as the original channel. If the incoming channel is on the center axis, the lowfrequency part of its signal is divided equally between the front-left and front-right channels. Reverse bass management The subwoofer signal is sent with equal gain to the front-left and front-right channels, with equal gain.

Scenario 6: The subwoofer signal sent with equal gain to each of the main output channels.

Note: In this context, the term surround refers to all main channels other than frontleft and front-right channels. It includes the front-center channel. The low-frequency portion refers to frequencies below a user-adjustable crossover frequency. When a user turns on bass management, the programming logic that the sAPO uses to decide which bass management mode to enable is to: Enable reverse bass management if the content has a .1 channel and there is no subwoofer channel. The lack of a subwoofer channel is indicated by either of the following: The GFX sAPO does have a .1 channel. The NoSub flag is set.

Enable forward bass management if the subwoofer is present or either of the following are true: The LRBig flag is set, indicating that the front and right speakers are large. The content has main channels other than the front-left and front-right channels.

When the NoSub and LRBig flags are both set, the content has both surround and subwoofer channels, which calls for both bass management modes.

2006 Microsoft Corporation. All rights reserved.

Reusing Windows Vista Audio System Effects - 21

Bass Management Settings The following settings are used to define the speaker configuration programmatically. Crossover frequency. Only some speakers, such as the subwoofer, can support frequencies below the crossover frequency. The setting is used for forward bass management, low-frequency protection, and bass boost. Multiple crossover frequenciessuch as different values for front and surround speakersare not supported. Speaker size for speakers other than the subwoofer has three settings: All big: All speakers can handle unlimited deep bass. All small: No speakers can go below the crossover frequency. Front LR big: The front left and right speakers are big, and the rest are small. This is referred to subsequently as LRBig.

LRBig allows, for example, forward bass management to work without an output subwoofer channel by redirecting deep bass signals from the surround/rear channels into the front channels. Otherwise, forward bass management requires an output subwoofer channel. Other modes of bass management also must know which main speakers are big. A flag that is named NoSub is set to indicate that no subwoofer is connected even though the output format advertised by the audio device or GFX input may include a .1 channel. The NoSub flag indicates that the output configuration is effectively N.0 as far as bass management is concerned. Note that "NoSub" is an explicit setting, separate from the presence of a lowfrequency effects (LFE) flag in the output channel mask that indicates a subwoofer. The output channel mask cannot be used to convey the presence or absence of a subwoofer because most drivers do not support N.0 channel masks for N > 4. Bass Management Channel Mask Dependencies Usually, at least some form of bass management is supported. This is true only if all of the following conditions are met: NoSub is set to FALSE. The output channel mask includes an LFE flag. There are no small output speakers .This includes when the speaker setup is LRBig, but stereo content is being played.

2006 Microsoft Corporation. All rights reserved.

Reusing Windows Vista Audio System Effects - 22

Channel Conversion (LFX sAPO)


Channel conversion handles several tasks. Headphone Virtualization This effect is enabled if the channel format of the content being played back (N.x) is 2.0 or larger, where x can be 0 or 1. The output mask must be stereo (0x3). The input mask is limited to a few supported combinations, which are listed in Table 3. Table 3: Headphone Virtualization Channel Masks
Name MASK_STEREO MASK_FRONTLR MASK_3_FRONT (SPEAKER_FRONT_CENTER | MASK_FRONTLR MASK_4_SQUARE (MASK_FRONTLR | MASK_BACKLR) MASK_4_DIAMOND (MASK_FRONTLR | MASK_FBCENTERS) MASK_5_BACK (MASK_FRONTLR | MASK_BACKLR | SPEAKER_FRONT_CENTER) MASK_5_SIDE (MASK_FRONTLR | MASK_SIDELR | SPEAKER_FRONT_CENTER) Value 0x3 0x7 0x33 0x107 0x3F 0x60F

Virtual Surround This effect is also referred as left /right (LTRT) folddown or left/right matrix encoding. It is used if the channel format of the content that is being played back (N.x) is 2.0 or larger, where x can be 0 or 1. LTRT folddown is normally 4.0 to 2.0. Any other input format is usually handled by first applying N.x to 4.0 generic folddown. However, in our implementation, LTRT folddown is natively 5.1 to 2.0. Any other input is handled by first applying N.x to 5.1 generic folddown first. The output channel mask must be 0x3 (stereo) and the number of input channels including the subwoofer if presentmust be no more than eight. Speaker Fill This effect is used when the number of input channels (N) is less than the number of output channels (M). The effect fills N.x channel to M.x channels, where x can be either 0 or 1. The channel masks in Table 4ignoring the LFE channelare supported for speaker fill. Speaker fill supports any combination of input or output subwoofer channel presence, so the numbers on the left are only examples. The actual configurations might or might not have a subwoofer. Table 4: Speaker Fill Channel Masks
Name MASK_STEREO MASK_FRONTLR MASK_3_FRONT (SPEAKER_FRONT_CENTER | MASK_FRONTLR MASK_4_SQUARE (MASK_FRONTLR | MASK_BACKLR) MASK_4_DIAMOND (MASK_FRONTLR | MASK_FBCENTERS) MASK_5_BACK (MASK_FRONTLR | MASK_BACKLR | SPEAKER_FRONT_CENTER) MASK_5_SIDE (MASK_FRONTLR | MASK_SIDELR | SPEAKER_FRONT_CENTER) MASK_7_SIDE_BACK (MASK_FRONTLR | MASK_BACKLR | SPEAKER_FRONT_CENTER | MASK_SIDELR) MASK_7_FRONT_SIDE (MASK_FRONTLR | MASK_SIDELR | SPEAKER_FRONT_CENTER | MASK_CENTERLR) MASK_7_FRONT_BACK (MASK_FRONTLR | MASK_BACKLR | SPEAKER_FRONT_CENTER | MASK_CENTERLR) Value 0x3 0x7 0x33 0x107 0x3F 0x60F 0x63F 0x6CF 0xFF

2006 Microsoft Corporation. All rights reserved.

Reusing Windows Vista Audio System Effects - 23

Speaker fill is not supported if any of the following is true: The input mask equals the output mask. The only difference between input and output is that one has side left/right channels; the other has back left/right channels. Input has more main channels than output has. The output mask includes the center left/right speakers, but the input mask does not. The set of channels in the output but not in the input does not include at least one of: front center, back left/right, or side left/right.

There is one exception to the second item on the list. If the only difference between input and output is that one has side left/right channels and the other has back left/right channels, speaker fill is supported if either format contains channels that would fall between sideLR and backLR in the channel mask bit order. There are three such channels: SPEAKER_FRONT_LEFT_OF_CENTER SPEAKER_FRONT_RIGHT_OF_CENTER SPEAKER_BACK_CENTER

If the input or output mask contains any of these three channels, speaker fill might be supported even though it does not meet the second condition on the list, but only if the other conditions are satisfied. For example, speaker fill from MASK_7_FRONT_BACK to or from MASK_7_FRONT_SIDE is supported by speaker fill for this reason. Table 5 has the full list of channel values, for convenient reference. Table 5. Channel Values
Name SPEAKER_FRONT_LEFT SPEAKER_FRONT_RIGHT SPEAKER_FRONT_CENTER SPEAKER_LOW_FREQUENCY SPEAKER_BACK_LEFT SPEAKER_BACK_RIGHT SPEAKER_FRONT_LEFT_OF_CENTER SPEAKER_FRONT_RIGHT_OF_CENTER SPEAKER_BACK_CENTER SPEAKER_SIDE_LEFT SPEAKER_SIDE_RIGHT Value 0x1 0x2 0x4 0x8 0x10 0x20 0x40 0x80 0x100 0x200 0x400

Delays are used for channels in the output configurations that are "outside" the front-back range in the input configuration. Conversely, if a speaker in the output configuration is "between" some speakers in the input configuration in the frontback sense, the output for that speaker is generated by mixing some of the input channels on either side of the output channel.

Loudness Equalization (LFX sAPO)


Loudness equalization is a compressed (dynamics) processing that is driven by a perceptual loudness metric.

2006 Microsoft Corporation. All rights reserved.

Reusing Windows Vista Audio System Effects - 24

Room Correction (GFX sAPO)


Room correction uses a profile that the Room Calibration Wizard generated. This profile is stored as a binary blob. The format of the blob is not currently published.

Speaker Phantoming (LFX sAPO)


To handle phantom channels correctly, a distinction must be made between two speaker masks: The output speaker mask is based on the speaker configuration as selected by the user, such as stereo, 5.1, 7.1, and so on. The physical speaker mask is based on the actual speaker configuration, with specific speakers phantomed. The physical speaker mask is available to the LFX and GFX sAPOs by reading the PKEY_AudioEndpoint_PhysicalSpeakerConfig from the endpoint property store.

There are exceptions when the physical speaker mask is ignored, including when: The physical mask is set to 0. The physical mask includes channels that are not present in the input mask. Either the output mask or the physical mask lacks left/right symmetry. The number of bits in the mask does not match the channel count for either the input or output format.

Figure 11 shows a block diagram that captures the processing modifications that are used to support phantom speakers. The nature of the optional folddown is a function of the LFX that is enabled.

P Remove Channels GFX Processing

P Add Zeroes

Fold Down (optional)

LFX Processing

Input Mask

Physical Mask

Output Mask

Figure 11. Speaker phantoming

Supported Formats
The Windows Vista custom audio system effects sAPOs support the following formats, sampling rates, and channel masks. PCM format The Windows Vista sAPOs supports only the pulse code modulation (PCM) 32-bit floating point format.

2006 Microsoft Corporation. All rights reserved.

Reusing Windows Vista Audio System Effects - 25

Sampling Rates The Windows Vista sAPOs supports the following common sample rates: 8000, 11025, 16000, 22050, 24000, 32000, 44100, 48000, 88200, 96000, 176400, and 192000. Channel Masks The Windows Vista sAPOs support the channel mask and number of channels as follows: If one channel mask is empty, it is assumed to be equal to the other channel mask. If both channel masks are empty, they are assumed to be equal to each other, but unknown. Both channel masks can be empty only if the number of input channels is equal to the number of output channels.

Except for cases of empty channel masks, all four of the possible LFE channel combinations are supported regardless of the rest of the channel mask: No in or out LFE channel In and out LFE channel In LFE but no out LFE channel Out LFE but no in LFE channel

Two categories of channel masks are supported only when both the channel masks and the number of channels match exactly between input and output, except possibly in the subwoofer channel. Main channel manipulation features are disabled in that case. Channel masks lacking left-right symmetry Channel masks with bits beyond the low 11 bits

If the input channel mask differs from the output channel mask in more than subwoofer presence, the combination is supported only if there is a channel manipulation featuresuch as speaker fill, virtual surround, or headphone virtualizationthat can translate one channel mask to the other. That feature must be enabled when the APO is initialized. There is an exception to the rule in the previous paragraph if both of the following conditions are true: The only difference between the input and output formats is that one has the side left/right channels whereas the other has the back left/right channels instead of the side channels. Neither format has any channels whose position in the channel mask flag order would fall between sideLR and left LR.

If these two criteria are satisfied, then the channel masks are treated as equivalent and conversion between them happens automatically by using memcpy. If, however, either format has any channels that would fall between sideLR and backLR in the channel mask order, simple conversion by using memcpy is not possiblethe channel positions would be corruptedand the conversion must be performed with speaker fill.

2006 Microsoft Corporation. All rights reserved.

Reusing Windows Vista Audio System Effects - 26

For example: 0x60F <=> 0x3F is supported without speaker fill. 0xFF <=> 0x6CF requires speaker fill. 0x13F <=> 0x70F would be possible by using speaker fill if it supported 6.1. However, because speaker fill does not currently support 6.1, 0x13F <=> 0x70F is not supported.

Supported IPropertyStore Properties


All of the following properties are currently supported by both sAPOs, although each sAPO uses only a subset of the properties. All properties and enumeration types are defined in wmcodecdsp.h. Default values for these properties are not documented because they are subject to change. However, default values can be discovered programmatically by instantiating the sAPO and calling its IPropertyStore::GetValue method. Most properties take effect only if they are set before LockForProcess is called. If a property is set before LockForProcess is called, IPropertyStore::SetValue usually succeeds whether or not the value can actually be honored. The reason is that most features have at least some dependencies on the input and output formats, which are not known until LockForProcess is called. If SetValue succeeds but LockForProcess later discovers that the value cannot be honored for the specified input or output format, the property on the APO's property store is automatically reset inside LockForProcess. To have any effect for the current playback session, the following properties must be set before calling LockForProcess. Changing them after LockForProcess has returnedduring playback, for exampledoes not affect the current playback session. However, the change will take effect the next time LockForProcess is called. The following sections describe the various property keys. The text in parentheses after the property key name is the PROPVARIANT type. MFPKEY_BASSMGMT_CROSSOVER_FREQ (VT_I4) This value is the low-pass/high-pass filter cutoff frequency for all bass management filters, including forward bass management, low-frequency protection, and bass boost. The exact range of acceptable values has not yet been determined, but values around 100 will definitely be supported. MFPKEY_BASSMGMT_SPKRBASSCONFIG (VT_I4) This value is used to determine which types of bass management can be enabled and whether to treat the front-left/front-right speakers differently. It has three possible values: AllMainSmall. This value precludes reverse bass management. AllMainBig. This value precludes forward bass management and low-frequency protection. FrontLRBig. The meaning of this value depends on whether a subwoofer is present: If a subwoofer is present, FrontLRBig enables only forward bass management. If a subwoofer is not present, FrontLRBig enables both forward and reverse bass management.

2006 Microsoft Corporation. All rights reserved.

Reusing Windows Vista Audio System Effects - 27

MFPKEY_BASSMGMT_BIGROOM (VT_BOOL) If this value is set to TRUE, assume that the bass outputs of various speakers arrive at the listener with randomized phase and add by power. FALSE means they arrive in phase and add algebraically. For forward bass management, this affects the filter design choice; it should be complementary for small rooms and noncomplementary for large rooms. For reverse bass management, this property should arguably affect how the gain with which the subwoofer channel is added to the bass-capable main channels depends on the number of such channels. It is normally 1/sqrt for large rooms and inverse-linear for small rooms. However, the distinction between big and small rooms is not currently implemented, and reverse bass management always uses the small-room (inverse-linear) rule, regardless of the value of this property. MFPKEY_BASSMGMT_NO_SUB (VT_BOOL) If this value is set to TRUE, no subwoofer is connected to the system even if the output channel mask includes an LFE channel. An example of such a scenario is a 5.1 sound card that is connected to a 5.0 speaker set. MFPKEY_BASSMGMT_INVERT_SUB (VT_BOOL) If this value is set to TRUE, the subwoofer channel output is inverted. The reason for this property is that subwoofers are sometimes wired backwards. MFPKEY_BASS_BOOST_AMOUNT (VT_I4) This value controls the amount of boost that is applied under MFPKEY_CORR_BASS_MANAGEMENT_MODE==2 ("boost"). The range of valid values includes [0..3]. MFPKEY_CORR_HEADPHONE (VT_BOOL) If this value is set to TRUE, the speaker configuration indicates a headphone. Setting this property to TRUE does not enable headphone virtualization. It is used only to determine whether headphone virtualization or room correction is possible. The two effects are mutually exclusive. MFPKEY_AUVRHP_ROOMMODEL (VT_I4) This value specifies the room model. Valid values, defined in wmcodecdsp.h, are: VRHP_SMALLROOM VRHP_MEDIUMROOM VRHP_BIGROOM

MFPKEY_CORR_MULTICHANNEL_MODE (VT_I4) This value specifies the multichannel mode. There are five modes: Normal: No main channel manipulations Passthru: Do not use. This value is obsolete and will be removed. SpkrFill: Speaker fill. HPVR: Headphone virtualization. Headphone virtualization is possible only if MFPKEY_CORR_HEADPHONE is set to true. LTRT: Virtual surround.

2006 Microsoft Corporation. All rights reserved.

Reusing Windows Vista Audio System Effects - 28

MFPKEY_CORR_BASS_MANAGEMENT_MODE (VT_I4) This value specifies the bass management mode. There are three modes: None: No bass management or bass boost. Management: Bass management only. The sAPO turns on whichever type of bass management is possible by using logic that was described earlier in this paper. Boost: Bass boost only. The sAPO enables the low-frequency protection form of bass management.

MFPKEY_ROOMCORR_PROFILE (VT_BLOB) This value is used to store the binary contents of the room profile file. The file extension is .rmp. The following properties are expected to be of no use in the scenarios that are described in this document. However, the GFX sAPO does use the MFPKEY_CORR_HEADPHONE property. Setting that property to TRUE disables room correction. MFPKEY_CORR_ROOM_CORRECTION_ENABLED (VT_BOOL) If this value is set to TRUE, it turns on room correction, if possible. It is possible if both of these conditions are met: A room profile was successfully loaded and parsed earlier through MFPKEY_ROOMCORR_PROFILE. The output channel mask set on the GFX APO is a subset of the channel mask in the room profile.

MFPKEY_CORR_LOUDNESS_EQUALIZATION_ON (VT_BOOL) If this value is set to TRUE, it turns on loudness equalization. This is always possible on the LFX sAPO. MFPKEY_LOUDNESS_EQUALIZATION_RELEASE (VT_I4) This value controls release time of the compressor that is used for loudness equalization. Higher numbers mean slower release: 0: As fast as possible (instant) 1: Extreme (practically instant) 2: Aggressive (~1s) 3: Reasonable (~3s) 4: Conservative (~7s) 5: Slow (~15s) 6: Very slow (~half a minute) 7: Extremely slow (~minute)

The audio effects UI for release setting has six slider positions that correspond to values 2 thru 7.

2006 Microsoft Corporation. All rights reserved.

Reusing Windows Vista Audio System Effects - 29

Mutual Exclusion and Feature Interactions


Table 6 shows how the custom effects interact and which are mutually exclusive. Table 6. Mutual Exclusion and Feature Interactions
Reverse bass management Speaker fill Loudness equalization Room correction Headphone virtualization Virtual surround matrix encoding Low-frequency protection Base boost
Key: v < << > >> x I Order does not matter Left feature comes before bottom feature Left feature must come before bottom feature Left feature comes after bottom feature Left feature must come after bottom feature Mutually exclusive Implied (one feature implies the other)

FBM V V > >> X X I X

RBM

SFill

LEQ

RCorr

VRHP

LtRt

LProt

V > >> X >> V X < >> X X >> >> v < < < < X X V << X V X V << I

Sample Code for Strategy A


This example shows how to use the Windows LFX sAPO for forward bass management, speaker filling, and loudness equalization. It might contain sAPO interface shortcuts that work with the Windows Vista LFX and GFX sAPOs but not with other APOs. For code that works with generic sAPOs, refer to proper documentation and samples. Note: The example assumes that CoInitialize has already been called.

2006 Microsoft Corporation. All rights reserved.

Reusing Windows Vista Audio System Effects - 30

#include #include #include #include #include

<objbase.h> <audioenginebaseapo.h> <stdio.h> <stdlib.h> <wmcodecdsp.h>

APO_CONNECTION_DESCRIPTOR inDesc = { APO_CONNECTION_BUFFER_TYPE_EXTERNAL, // APO_CONNECTION_BUFFER_TYPE NULL, 0, // u32MaxFrameCount NULL, APO_CONNECTION_DESCRIPTOR_SIGNATURE }, *pInDesc = &inDesc, outDesc = { APO_CONNECTION_BUFFER_TYPE_EXTERNAL, // APO_CONNECTION_BUFFER_TYPE NULL, 0, NULL, APO_CONNECTION_DESCRIPTOR_SIGNATURE }, *pOutDesc = &outDesc; APO_CONNECTION_PROPERTY inConn = { NULL, 0, // frame count BUFFER_VALID, APO_CONNECTION_PROPERTY_SIGNATURE }, *pInConn = &inConn, outConn = { NULL, 0, // frame count BUFFER_INVALID, APO_CONNECTION_PROPERTY_SIGNATURE }, *pOutConn = &outConn; #define CHECKHR(x) hr = x; if (FAILED(hr)) {printf("%d: %08X\n", __LINE__, hr); goto exit;} #define SET_I4(pkey,val) pv.vt = VT_I4; pv.lVal = val; CHECKHR(pPS>SetValue(pkey, &pv)); #define SET_BOOL(pkey,val) pv.vt = VT_BOOL; pv.boolVal = val ? VARIANT_TRUE : VARIANT_FALSE; CHECKHR(pPS->SetValue(pkey, &pv)); void useAPO() { IUnknown* pUnk = NULL; IAudioProcessingObjectRT* pRT = NULL; IAudioProcessingObjectConfiguration* pConfig = NULL; IPropertyStore* pPS = NULL; WAVEFORMATEXTENSIBLE wfx; IAudioMediaType* pAMTIn = NULL, *pAMTOut = NULL; PROPVARIANT pv; HRESULT hr; CHECKHR(CoCreateInstance(CLSID_CWMAudioLFXAPO, NULL, CLSCTX_INPROC_SERVER, IID_IUnknown, (void**)&pUnk)); CHECKHR(pUnk->QueryInterface(__uuidof(IAudioProcessingObjectRT), (void**)&pRT)); CHECKHR(pUnk>QueryInterface(__uuidof(IAudioProcessingObjectConfiguration), (void**)&pConfig)); CHECKHR(pUnk->QueryInterface(IID_IPropertyStore, (void**)&pPS));

2006 Microsoft Corporation. All rights reserved.

Reusing Windows Vista Audio System Effects - 31

SET_I4(MFPKEY_CORR_MULTICHANNEL_MODE, 2); // turn on speaker filling SET_I4(MFPKEY_CORR_BASS_MANAGEMENT_MODE, 1); // turn on bass management SET_I4(MFPKEY_BASSMGMT_SPKRBASSCONFIG, 0); // all main speakers are small SET_I4(MFPKEY_BASSMGMT_CROSSOVER_FREQ, 120); SET_BOOL(MFPKEY_CORR_LOUDNESS_EQUALIZATION_ON, TRUE); // turn on loudness equalization SET_BOOL(MFPKEY_BASSMGMT_BIGROOM, TRUE); // affects bass management filter // initialize WAVEFORMATEXTENSIBLE for the input format wfx.Format.wFormatTag = WAVE_FORMAT_EXTENSIBLE; wfx.Format.nChannels = 2; wfx.Format.nSamplesPerSec = 44100; wfx.Format.wBitsPerSample = 32; wfx.Format.nBlockAlign = wfx.Format.wBitsPerSample / 8 * wfx.Format.nChannels; wfx.Format.nAvgBytesPerSec = wfx.Format.nSamplesPerSec * wfx.Format.nBlockAlign; wfx.Format.cbSize = 22; wfx.Samples.wValidBitsPerSample = 32; wfx.dwChannelMask = 3; // stereo wfx.SubFormat.Data1 = WAVE_FORMAT_IEEE_FLOAT; wfx.SubFormat.Data2 = 0x0000; wfx.SubFormat.Data3 = 0x0010; wfx.SubFormat.Data4[0] = 0x80; wfx.SubFormat.Data4[1] = 0x00; wfx.SubFormat.Data4[2] = 0x00; wfx.SubFormat.Data4[3] = 0xaa; wfx.SubFormat.Data4[4] = 0x00; wfx.SubFormat.Data4[5] = 0x38; wfx.SubFormat.Data4[6] = 0x9b; wfx.SubFormat.Data4[7] = 0x71; CHECKHR(CreateAudioMediaType(&wfx.Format, &pAMTIn)); // modify WAVEFORMATEXTENSIBLE for the output format wfx.Format.nChannels = 6; wfx.Format.nBlockAlign = wfx.Format.wBitsPerSample / 8 * wfx.Format.nChannels; wfx.Format.nAvgBytesPerSec = wfx.Format.nSamplesPerSec * wfx.Format.nBlockAlign; wfx.dwChannelMask = 0x3f; // one 5.1 flavor CHECKHR(CreateAudioMediaType(&wfx.Format, &pAMTOut)); pInDesc->pFormat = pAMTIn; pOutDesc->pFormat = pAMTOut; CHECKHR(pConfig->LockForProcess(1, &pInDesc, 1, &pOutDesc)); while (0/*have data to process*/) { pInConn->u32ValidFrameCount = 0; // sample count pInConn->pBuffer = 0; // input buffer pOutConn->pBuffer = 0; // output buffer pOutConn->u32BufferFlags = BUFFER_INVALID; pRT->APOProcess(1, &pInConn, 1, &pOutConn); // do something with the output buffer } pConfig->UnlockForProcess();

2006 Microsoft Corporation. All rights reserved.

Reusing Windows Vista Audio System Effects - 32

exit: #define SAFE_RELEASE(p) if (p) p->Release(); SAFE_RELEASE(pAMTIn); SAFE_RELEASE(pAMTOut); SAFE_RELEASE(pUnk); SAFE_RELEASE(pRT); SAFE_RELEASE(pConfig); SAFE_RELEASE(pPS); }

Detailed Guidelines for Strategy B


This section contains detailed guidelines for implementing custom audio system effects sAPOs, based on Strategy B, described in "How to Combine Custom and Windows Vista sAPOs" earlier in this paper. To summarize, Strategy B implements a sAPO that is largely a thin wrapper around the corresponding Windows Vista sAPO. Throughout this section, custom sAPO refers to the IHVs implementation of the sAPO.

Programming Information
This section covers the general programming issues that must be addressed when using strategy B to implement custom sAPOs. Both LFX and GFX custom audio system effects sAPOs have the following general characteristics: They must be registered as COM in-process server objects that can be instantiated by using CoCreateInstance. The CLSIDs are CLSID_CWMAudioLFXAPO and CLSID_CWMAudioGFXAPO for the LFX and GFX sAPOs, respectively. The CLSIDs are declared in wmcodecdsp.h and defined in wmcodecdspuuid.lib. They must support COM aggregation. However, aggregation is not expected to be used in custom audio system effects scenario, so it should pose no significant problems.

Initialization
A custom sAPO must initialize the Window Vista sAPO by calling its IAudioSystemEffects::Initialize method. This is typically done from the custom APOs Initialize method. Any arguments that are passed to the custom sAPO's Initialize method should be passed directly to the Windows sAPOs Initialize. This allows the Windows Vista sAPO to fetch its settings from the endpoint and Fx property stores in the APOInitSystemEffects structure. It is possible to have the custom sAPO fetch the settings and selectively pass them to the Windows Vista sAPO, but that is essentially Strategy A. If the custom sAPO replaces a Windows Vista feature, it is generally advisable to turn off the corresponding feature on the Windows Vista sAPO. However, turning off the Windows Vista feature might not be strictly necessary, depending on how the feature works. To turn off a feature, query the Windows Vista sAPO for its IPropertyStore interface and call IPropertyStore::SetValue. The properties that are supported by the Windows Vista sAPO's property store are described in "Supported IPropertyStore Properties." later in this paper. For examples of how to communicate with the Windows custom audio system effects sAPO property store, see the "compress" and the "spkrfill" samples.

2006 Microsoft Corporation. All rights reserved.

Reusing Windows Vista Audio System Effects - 33

Query Windows Vista sAPO's Feature State


If a custom sAPO merely replaces a Windows Vista audio effects feature and does not have its own configuration UI or settings store, it might have to determine what features are enabled on the corresponding Windows Vista sAPO. There are two ways to get this information: Option A: By directly querying the Fx property store. Option B: Indirectly, by instantiating the Windows Vista sAPO and using its IPropertyStore interface to query the property store.

Option A This option has the advantage that it can be done without instantiating a Windows Vista sAPO. Also, if a custom sAPO wants to monitor the Fx property store, Option A is the only way to receive on-the-fly property change notifications. For an example of Option A, see the "compress" sample. With Option A, the custom sAPO queries the main endpoint property storenot Fxfor PKEY_AudioEngine_DeviceFormat. It then uses the channel mask from that format as the PID for the property key that is used to query the Fx property store. The GUID (fmtid) for the property key that is used to query the Fx property store is one of the XXX_XXX_KEY_GUID values from wmcodecdsp.h. The _KEY_GUID names correspond in obvious ways to the MFPKEY_ names that were discussed earlier in this paper. For examples of this approach, see the Initialize code from the "compress" and "spkrfill" samples. Option B This option has the advantage that it can correctly handle the possibility that the Windows sAPO could eventually have some of its features enabled by default if the corresponding property in the Fx property store does not exist. With Option B, the custom sAPO simply queries the Windows Vista sAPO for its IPropertyStore interface and calls IPropertyStore::GetValue by using one of the MFPKEY_XXX keys that were discussed earlier in this paper. None of the samples does this, although the "spkrfill" sample could use this approach.

Format Negotiation
When implementing a custom LFX sAPO that wraps the Windows Vista LFX sAPO, do not specify APO_FLAG_FRAMESPERSECOND_MUST_MATCH in the custom sAPO's registration properties. This rule should be followed whether or not the custom sAPO can change the channel format. If the custom LFX sAPO were to specify this flag, it would prevent the corresponding Windows Vista LFX from doing speaker filling, headphone virtualization, or virtual surround. A custom LFX sAPO implementation must implement or override IAudioProcessingObject::IsInputFormatSupported. The base class IsInputFormatSupported implementation is unlikely to accurately reflect the set of possible channel conversions that were implemented by the custom LFX sAPO and the Windows Vista LFX sAPO. The custom LFX sAPO's IsInputFormatSupported method should call the corresponding Windows Vista sAPO's IsInputFormatSupported. This ensures that the Windows Vista LFX sAPO handles any channel conversions that are not handled by the custom LFX sAPO. Note that the Windows Vista LFX sAPO might be updated to support more conversions in future Windows releases. Calling the Windows Vista sAPO's IsInputFormatSupported method is one way to ensure that the set of channel conversions that are supported by the custom sAPO completely

2006 Microsoft Corporation. All rights reserved.

Reusing Windows Vista Audio System Effects - 34

contains the set of channel conversions that are supported by the Windows Vista LFX sAPO. What the custom sAPO should do with the return value from the Windows Vista LFX sAPO's IsInputFormatSupported method depends on what channel conversions, if any, the custom LFX sAPO supports. If the custom LFX sAPO does not support any of its own channel conversions, its IsInputFormatSupported method can return the value that was returned by the Windows Vista LFX sAPO's IsInputFormatSupported method directly to the caller. For an example, see the "swap" and "compress" samples. If the custom LFX sAPO supports its own channel conversions, then a negative return valueincluding S_FALSEfrom the Windows Vista LFX sAPO's IsInputFormatSupported method does not necessarily translate into a negative return value to the caller. The custom LFX sAPO could, for example, support channel conversions that are not supported by the corresponding Windows Vista sAPO. In that case, the custom LFX sAPO must combine the return value from the Windows Vista LFX sAPO's IsInputFormatSupported method with its own logic for determining supported inputs. For an example, see the "spkrfill" sample's IsInputFormatSupported implementation. Note that the optimal meaning of "combine" depends on which type of channel conversion should take precedence. It might be appropriate to deviate from the "spkrfill" sample, depending on the exact design of the custom implementation. The IsOutputFormatSupported method on an LFX sAPO is uninteresting because a LFX sAPO's output format is the device's mix format. This format is based on external considerations and cannot be affected by an LFX sAPO or its input format. For that reason, the samples do not attempt to implement correct logic for IsOutputFormatSupported. The above considerations do not apply to GFX sAPOs because the Windows Vista GFX sAPO does not implement any features that require or imply changing the channel format. For that reason, the GFX sample does nothing special for either IsInputFormatSupported or IsOutputFormatSupported. The format negotiation logic of a custom GFX sAPO is not affected by the fact that it is wrapping the Windows Vista GFX sAPO.

LockForProcess/UnlockForProcess
The custom sAPO's IAudioProcessingObjectConfiguration::LockForProcess method should call the corresponding method on the Windows Vista sAPO. LockForProcess() is a good place to make decisions as to the order in which the various processing stages should happen. For example, it can decide whether to apply custom sAPO processing or the Windows Vista sAPO's processing first. All three samples provide examples of such decision logic, and the comments in the samples provide some background. However, it is impossible to provide completely general guidance on that subject in this document because it would require knowledge of the specific features of the custom sAPO and how they might interact with the Windows Vista sAPOs features.

GetLatency
The custom sAPOs IAudioProcessingObject::GetLatency implementation should call GetLatency on the Windows Vista sAPO that is being wrapped. If the custom sAPO processing incurs latency, it should add it to the result that was returned by the Windows Vista sAPO before returning the value to the caller.

2006 Microsoft Corporation. All rights reserved.

Reusing Windows Vista Audio System Effects - 35

APOProcess
The custom sAPO's IAudioProcessingObjectRT::APOProcess method should call the Windows Vista sAPO's APOProcess method before, after, or even during processing. The decision on when to call APOProcess should be made in LockForProcess, so that any necessary intermediate buffers can be allocated. The Windows Vista sAPOs support in-place processing whenever their input and output formats are identical. In that case, the custom APO can pass the same APO_CONNECTION_PROPERTY as both the input and output connection property for the Windows sAPO. The custom sAPO should not, however, use the custom sAPO's input connection property as the output connection property for the Windows Vista sAPO. In general, sAPOs should not modify their input buffer.

Handling Windows Vista sAPO errors


If a Windows Vista sAPO returns an error to the corresponding custom sAPO, the custom sAPO should act from that point on as if there is no Windows Vista sAPO. The samples treat all Windows Vista sAPO errors as equivalent to CoCreateInstance failing to create the sAPO. Optionally, the custom sAPO can limit the effect of errors from the Windows Vista sAPO's LockForProcess method to the current session. In other words, the custom sAPO does not use the Windows Vista sAPO during subsequent calls to its APOProcess method. However, the custom sAPO could try using the Windows Vista sAPO again if there is another LockForProcess call later, with different formats.

Compilation and Linking


To use the Windows Vista sAPO CLSID and property key definitions, include wmcodecdsp.h and link with wmcodecdspuuid.lib. There are three sample audio system effects replacement implementations of varying complexity. All three preserve the Windows custom audio system effects sAPO's functionality by hosting it inside a custom sAPO. All three samples assume that the Windows audio effects UI is not replaced. Table 7 lists the characteristics of the three samples: Swap, Compress, and Spkrfill. Table 7. Sample Features
Sample Replaces LFX Replaces GFX Resembles the audio engine custom audio system effects sample Adds a new custom audio system effects feature Assumes a separate UI Replaces Windows Vista custom audio system effects features Reads Windows Vista custom audio system effects properties Uses variable processing ordering Shows Microsoft effect ordering Changes the number of channels Uses complex format negotiation Uses dynamic on/off Swap Yes Yes Yes Yes Yes No No Yes No No No Yes Compress Yes No No No No Yes Yes No No No No Yes Spkrfill Yes No No No No Yes Yes Yes Yes Yes Yes No

2006 Microsoft Corporation. All rights reserved.

Reusing Windows Vista Audio System Effects - 36

General Guidelines for Custom Audio System Effects


The following are some guidelines that IHVs should follow when implementing custom audio system effects sAPOs. All audio system effects should provide on/off options. Users should not be forced to use an audio system effect. Interactions between features in the LFX and GFX sAPO should be mediated by the sAPOs and their related UI. Features that are specified as LFX or GFX here can be moved between LFX and GFX in custom implementations. However, this should be done with the understanding that the on/off options should exist and that the accessibility and appropriateness of the options should not be compromised. Implementers should remember that the LFX can have different input and output channel masks. The GFX sAPO must have the same input and output channel masks.

Resources
White Papers: A Wave Port Driver for Real-Time Audio Streaming http://www.microsoft.com/whdc/device/audio/wavertport.mspx Audio Device Technologies for Windows http://www.microsoft.com/whdc/device/audio/default.mspx Custom Audio Effects in Windows Vista http://www.microsoft.com/whdc/device/audio/sysfx.mspx Device Finish-Install Actions in Windows Vista http://www.microsoft.com/whdc/driver/install/Finish_Install.mspx Pin Configuration Guidelines for High Definition Audio Devices http://www.microsoft.com/whdc/device/audio/PinConfig.mspx Plug and Play Guidelines for High Definition Audio Devices http://www.microsoft.com/whdc/device/audio/HD-aud_PnP.mspx UAA Hardware Design Guidelines http://www.microsoft.com/whdc/device/audio/UAA_HWdesign.mspx Universal Audio Architecture http://www.microsoft.com/whdc/device/audio/uaa.mspx Windows Driver Kit (WDK) http://www.microsoft.com/whdc/driver/WDK/ Software Development Kits: APO SDK (available Summer 2006) http://msdn.microsoft.com/library/default.asp?url=/library/enus/sdkintro/sdkintro/devdoc_platform_software_development_kit_start_page.asp Platform SDK http://msdn.microsoft.com/library/enus/sdkintro/sdkintro/devdoc_platform_software_development_kit_start_page.asp Windows Vista SDK http://msdn.microsoft.com/windowsvista/

2006 Microsoft Corporation. All rights reserved.

Reusing Windows Vista Audio System Effects - 37

Appendix. Run-Time Considerations When Reusing Windows Vista sAPOs


This section contains some additional information that IHVs and OEMs may find useful when implementing their custom audio system effects. A custom sAPO implementation: Uses CoCreateInstance to instantiate one or more instances of the Windows custom audio system effects sAPOs. Configures each instance to enable the desired set of features. Inserts each instance into an appropriate place within the custom sAPOs internal pipeline.

Why one or more instances? To avoid undesirable interactions, most features require a certain relative ordering. Because Windows Vista sAPOs implement multiple features inside a single sAPO, multiple instances of that sAPO might be required to ensure correct ordering. For example, assume that three enabled featuresA, B, and Cmust be ordered ABC. The custom implementation handles B but delegates A and C to the Windows sAPO. A and C must then be in separate instances of the Microsoft sAPO so that the custom implementation of B can happen between them. Windows Vista implements room correction in the GFX sAPO, which means it is a separate COM object from the LFX sAPO. A custom implementation could choose to delegate room correction to the Windows implementation but place it in a custom LFX sAPO. The custom LFX implementation might then need to delegate some processing to the Windows Vista LFX sAPO implementation and other processing to the Windows Vista GFX sAPO implementation.

Handling the Limitations of Different Input-Output Format Combination


Many featuresespecially bass managementdo not work in certain cases. For example, forward bass management is undefined if the bass speaker configuration property is "AllSmall" or "AllLarge" and the output format does not include a subwoofer channel or the NoSub flag is set. It is not always possible to detect the failure during the IPropertyStore::SetValue() call. The method attempts to enable the feature, but the input and output formats are not known at that time because LockForProcess must happen after all property manipulations. This means that it is possible to enable a feature, see it apparently succeed, but not have the corresponding processing take place. Two strategies are available for dealing with such situations: Carefully study the feature-specific sections of this document to be able to predict exactly when a given feature will or will not succeed. Call IPropertyStore::GetValue after LockForProcess is called to check the state of important properties. When LockForProcess determines that a particular feature cannot be enabledbecause of the input and output formats or the value of some other propertyLockForProcess updates the value of the corresponding property in the property store.

2006 Microsoft Corporation. All rights reserved.

Reusing Windows Vista Audio System Effects - 38

Interaction between Speaker Fill and Bass Management


When speaker fill is on and a subwoofer is connected, forward bass management must occur before speaker fill to avoid comb filtering of the low-frequency signal by the speaker fill's surround delay. When speaker fill is enabled and no subwoofer is connected, two types of forward bass management are possible: If the front left/right speakers are big, forward bass management routes the lowfrequency portion of the surround and center channels into the front left/right speakers. Forward bass management must come after speaker fill in this case. If all speakers are small, forward bass management becomes low-frequency protection for all main speakers. This can occur either before or after speaker fill. However, for performance reasons, it is better to have forward bass management before speaker fill.

The Windows Vista sAPO implements certain common speaker fill configurations, such as 2.0 => 5.1, with special optimized code that handles reverse bass management in the same step as speaker fill.

Interaction between Folddown and Bass Management


Headphone virtualization supports only reverse bass management: Forward bass management does not make sense with headphone virtualization. For implementation simplicity, low-frequency protection and bass boost are not supported.

When any of the headphone virtualization, virtual surround encoding, or speaker fill effects are on, reverse bass management is handled during that step. Reverse bass management is still controlled via the sAPOs reverse bass management property as if it were a separate feature. In these cases, reverse bass management simply controls the folddown coefficients for the .1 input channel. One open issue is that reverse bass management cannot be disabled when LTRT is on. In that case, reverse bass management uses an unconventional subwoofer channel gain. The Windows audio system effects sAPOs apply some minor processinggain and delayeven when no features are enabled. The goal of such processing is to ensure that the gain and delay parameters do not change when a feature is enabled on the fly. The reason is that delay is inherent in the implementation of some features, and a gain <1 is applied by some features to avoid excessively high output in certain situations. The set of available features depends on the input-output formats and certain properties, and so does the cumulative normalization gain and delay. If features will not be turned on or off on the fly, normalization gain can be disabled by setting the MFPKEY_CORR_NORMALIZATION_GAIN property to FALSE by calling IPropertyStore::SetValue. The property might be TRUE by default. There is no mechanism to disable the normalization delay because it is presumed less likely to be objectionable than normalization gain. If normalization delay is objectionable, simply bypass the sAPO in question.

2006 Microsoft Corporation. All rights reserved.