Vous êtes sur la page 1sur 167

Spectral Analysis, Editing, and Resynthesis:

Methods and Applications

Michael Kateley Klingbeil

Submitted in partial fulfillment of the


requirements for the degree
of Doctor of Musical Arts
in the Graduate School of Arts and Sciences

COLUMBIA UNIVERSITY

2009
© 2008
Michael Kateley Klingbeil
All Rights Reserved
ABSTRACT

Spectral Analysis, Editing, and Resynthesis: Methods and Applications

Michael Kateley Klingbeil

This document describes the design, development, and applications of cross-platform

software for audio analysis, editing, and resynthesis. Analysis is accomplished using a

variation of the McAulay-Quatieri technique of peak interpolation and partial tracking.

Linear prediction of the partial amplitudes and frequencies is used to determine the

best continuations for sinusoidal tracks. A high performance user interface supports

flexible selection and immediate manipulation of analysis data, cut and paste, and un-

limited undo/redo. Hundreds of simultaneous partials can be synthesized in real-time

and documents may contain thousands of individual partials dispersed in time without

degrading performance. A variety of standard file formats, including the Sound Descrip-

tion Interchange Format (SDIF), are supported for import and export of analysis data.

Specific musical and compositional applications, including those in works by the author,

are discussed.
Contents

List of Figures iv

List of Tables vi

Acknowledgements vii

1 Introduction 1

1.1 Timbre and the Computer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Synthesis Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.3 Theory Versus Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.3.1 Implementation History . . . . . . . . . . . . . . . . . . . . . . . . . . 10

1.3.2 Present Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2 Spectral Modeling 13

2.1 The Frequency Domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.1.1 Sinusoids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.1.2 The Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.1.3 The Discrete Fourier Transform . . . . . . . . . . . . . . . . . . . . . 16

2.1.4 The Short-Time Fourier Transform . . . . . . . . . . . . . . . . . . . . 18

2.2 STFT Analysis and the Phase Vocoder . . . . . . . . . . . . . . . . . . . . . . 22

2.2.1 Putting Phase to Good Use . . . . . . . . . . . . . . . . . . . . . . . . 24

2.2.2 Resynthesis and Transformation . . . . . . . . . . . . . . . . . . . . . 25

2.2.3 Time Expansion/Compression . . . . . . . . . . . . . . . . . . . . . . 26

2.2.4 Transposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

2.2.5 Refinements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

2.3 Spectral Envelopes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

i
2.4 Sinusoidal Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

2.4.1 STFT Frequency Interpolation . . . . . . . . . . . . . . . . . . . . . . 32

2.4.2 Partial Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

2.4.3 Improved Partial Tracking . . . . . . . . . . . . . . . . . . . . . . . . . 37

2.4.4 Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

2.4.5 Limitations and Extensions . . . . . . . . . . . . . . . . . . . . . . . . 41

3 Sinusoidal Analysis and Resynthesis 42

3.1 Analysis Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3.1.1 Analysis Window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3.1.2 Amplitude Thresholds . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

3.2 Partial Tracking Using Linear Prediction . . . . . . . . . . . . . . . . . . . . 46

3.3 Resynthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

3.3.1 Inverse FFT Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

3.3.2 IFFT Synthesis Implementation . . . . . . . . . . . . . . . . . . . . . . 52

3.3.3 Oscillator Bank Resynthesis . . . . . . . . . . . . . . . . . . . . . . . . 53

3.3.4 Phase Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

3.4 Data Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

3.4.1 Time-span Frames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

3.5 Transient Sharpening . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

3.6 Noise Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

3.6.1 Bandwidth Enhanced Sinusoids . . . . . . . . . . . . . . . . . . . . . 62

4 Editing and Transformation 64

4.1 Graphical User Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

4.2 Selection Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

4.2.1 Rule-based Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

4.3 Editing Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

4.4 Time Expansion and Contraction . . . . . . . . . . . . . . . . . . . . . . . . . 70

4.5 Frequency Modifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

ii
4.6 Data Exchange . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

5 Compositional Applications 78

5.1 Spectral Composition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

5.2 Sonic Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

5.2.1 Timbre Hybridization . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

5.2.2 Spectral Tuning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

5.3 Software Composition Environments . . . . . . . . . . . . . . . . . . . . . . 85

5.3.1 OpenMusic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

5.3.2 Max/MSP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

5.3.3 Common Music . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

5.4 Compositional Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

5.5 Performance Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

6 Conclusions 97

References 101

A Tear of the Clouds score 107

A.1 Program Note . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

A.2 Score . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

iii
List of Figures

N
2.1 DFT magnitude spectrum of a sinusoid with period 4 . . . . . . . . . . . . 18
N
2.2 DFT magnitude spectrum of a sinusoid with period 4.21 . . . . . . . . . . . 19

2.3 Sinusoids of 4.21 and 4.0 cycles per N samples . . . . . . . . . . . . . . . . . 19

2.4 DFT magnitude spectrum and the interpolated underlying magnitude spec-
N
trum of a sinusoid with period 4.21 . . . . . . . . . . . . . . . . . . . . . . . . 21

2.5 DFT magnitude spectrum and the interpolated underlying magnitude spec-
N
trum of a sinusoid with period 4 . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.6 Rectangular window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.7 Triangular window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.8 Hann window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.9 Blackman window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.10 Overlap-adding the product of Blackman analysis and synthesis windows

with varying overlap factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

2.11 Spectral envelope of a soprano voice . . . . . . . . . . . . . . . . . . . . . . . 30

2.12 Phase vocoder analysis of a frequency sweep . . . . . . . . . . . . . . . . . . 32

2.13 Sinusoidal modeling analysis of a frequency sweep . . . . . . . . . . . . . . 32

2.14 Estimating a peak in the spectrum based on a parabolic curve fitting . . . . 33

2.15 Connecting STFT peaks into sinusoidal tracks . . . . . . . . . . . . . . . . . 35

2.16 Partial tracking conflicts due to glissandi . . . . . . . . . . . . . . . . . . . . 37

3.1 Magnitude spectrum of the main lobes of two Blackman windows with

complete separation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

3.2 Magnitude spectrum of the main lobes of two Blackman windows with the

minimum allowable frequency separation . . . . . . . . . . . . . . . . . . . . 44

iv
3.3 Frequency dependent threshold curve with the default values . . . . . . . . 46

3.4 Comparison of sinusoids generated via IFFT synthesis . . . . . . . . . . . . 52

3.5 Family of cubic phase interpolation functions with different values of M . . 55

3.6 Data structure for division of partials into time-span frames . . . . . . . . . 57

3.7 Breakpoints from the analysis of a castanet click using the standard model 60

3.8 Breakpoints from the analysis of a castanet click using time reassignment . 60

4.1 Zoomed-out display . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

4.2 Zoomed-in display . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

4.3 The SPEAR tool palette . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

4.4 Playback control sliders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

4.5 Different breakpoint storage models . . . . . . . . . . . . . . . . . . . . . . . 69

4.6 Original partials prior to time expansion . . . . . . . . . . . . . . . . . . . . 71

4.7 Partials time expanded in independent mode . . . . . . . . . . . . . . . . . . 71

4.8 Partials time expanded in proportional mode . . . . . . . . . . . . . . . . . . 71

4.9 Dialog boxes for transposition and frequency shifting . . . . . . . . . . . . . 72

4.10 Frequency flipping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

4.11 The par-text-frame-format specification . . . . . . . . . . . . . . . . . . . . 76

4.12 Sample data as par-text-frame-format . . . . . . . . . . . . . . . . . . . . . 76

4.13 The par-text-partials-format specification . . . . . . . . . . . . . . . . . . 77

4.14 Sample data as par-text-partials-format . . . . . . . . . . . . . . . . . . . 77

5.1 OpenMusic patch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

5.2 OpenMusic notation editor . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

5.3 OpenMusic patch that reads and writes SPEAR data . . . . . . . . . . . . . 88

5.4 Max/MSP patch from the CNMAT Spectral Synthesis Tutorials . . . . . . . 89

5.5 Dynamic retuning with crossing partials . . . . . . . . . . . . . . . . . . . . 95

v
List of Tables

1.1 Digital synthesis techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

4.1 Some of the standard SDIF matrix types . . . . . . . . . . . . . . . . . . . . . 74

4.2 Supported file formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

5.1 Transformation functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

5.2 Frequency adjustment functions . . . . . . . . . . . . . . . . . . . . . . . . . 92

5.3 Spectral envelope functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

vi
Acknowledgements

I wish to thank my advisor Tristan Murail and committee members Brad Garton and Fred

Lerdahl for their insight and support throughout my time at Columbia University (and

beyond). Thanks also goes to James Beauchamp for introducing me to the details of signal

processing, to Rick Taube for revealing the realm of the metalevel, and to Karl Klingbeil

for opening my ears to the world of contemporary music. Special gratitude goes to Anita

Buckmaster for her unwavering encouragement and assistance.

This dissertation is dedicated to my parents, Laura Kateley and Jack Klingbeil.

September 17, 2008

vii
1

1. Introduction

1.1 Timbre and the Computer

Considerable energy and effort has been applied to further the understanding of timbre,

one of music’s most elusive features. The American National Standards Institute defines

timbre in a purely exclusionary manner: “. . . that attribute of sensation in terms of which

a listener can judge that two sounds having the same loudness and pitch are dissimilar.”

(ANSI S1.1-1960(R1976)-12.9) This rather unsatisfying description essentially relegates all

aspects of sound other than pitch and loudness to a single perceptual category. Since at

least the beginning of the twentieth-century, the conception of timbre has taken on an

increasingly important role in Western compositional practice. In 1922 Arnold Schoenberg

asserted, “The evaluation of tone color, the second dimension of tone, is in a much less

cultivated, much less organized state than is the aesthetic evaluation of pitch. Nevertheless,

we go right on boldly connecting sounds with one another, contrasting them with one

another, simply by feeling. . . ” (Schoenberg 1983). More than eight decades later, this

statement still seems regrettably accurate. Schoenberg goes on to call for a theory of timbre,

analogous to pitch theories, that will enable the composition of “tone color melodies.”

While falling far short of a unified theory, scientific and artistic efforts of recent decades

have achieved a greatly expanded understanding of timbre. For example, while Schoenberg

describes tone color as a singular “second dimension” of tone, there is now strong evidence

suggesting that timbre is in some sense multi-dimensional (Grey 1977). It is now theorized

that sensations of pitch and timbre are closely intertwined, and that we likely experience

different classes of pitch sensation (spectral pitches and virtual pitches) (Terhardt et al.
1. Introduction 2

1982). What is undeniable, whether a composer grounds her work in theory, or spirituality,

or chooses to work “simply by feeling,” is that technological resources have had a profound

impact on recent musical practice, and that an understanding of timbre is essential to

effectively harnessing these resources.

The application of the digital computer, in particular, has done much to expand the

practical range of sonic possibilities and also to enable the theoretical and empirical study

of timbre. In a prophetic lecture given in 1936, Edgard Varèse proposed a vision of music

made possible by machine:

. . . the new musical apparatus I envisage, able to emit sounds of any number
of frequencies, will extend the limits of the lowest and highest registers . . .
Not only will the harmonic possibilities of the overtones be revealed in all
their splendor, but the use of certain interferences created by the partials
will represent an appreciable contribution. . . . The time will come when
the composer, after he has graphically realized his score, will see this score
automatically put on a machine that will faithfully transmit the musical content
to the listener. (Varèse and Chou 1966, 12)

Varèse was hardly the first, nor last, to make such predictions, and yet the strength of

his conviction and the precision of his forecast remain striking. By 1953, when Varèse

would begin his first compositional experiments with magnetic tape, numerous composers,

technicians, and researchers were already at work in studios throughout the world ex-

ploring the musical applications of electronic instruments. By 1957, with the first musical

applications of general purpose digital computers at Bell Labs, the stage was set for the

eventual realization of Varèse’s predictions.

It was in fact the advent of digital sound synthesis that made this vision possible.

Computer music pioneer Max V. Matthews understood the power and generality of digital

synthesis which he described in his 1963 article in Science: “With the aid of suitable output

equipment, the numbers which a modern digital computer generates can be directly

converted to sound waves. The process is completely general, any perceivable sound can be so

produced.1 ” (Matthews 1963, 553). For the composer seeking to explore a world of timbral

possibilities, the notion is seductive. And yet the computer music composer, whether
1 My emphasis
1. Introduction 3

working in 1967 or forty years later, still faces a number of formidable challenges. We can

summarize these particular problems as follows: the time/cost problem, the synthesis

problem, the control problem, and the compositional problem. Let us briefly consider each

of these issues in turn.

Digital music production distinguished itself from all previous musical-mechanical

approaches in that sound creation could occur on a time scale completely distinct from the

temporal evolution of the sound itself. At first these computations were ponderous (many

orders of magnitude slower than real-time), but as computational power increased over the

decades, reasonably complex and credible sounds could be computed faster than real-time.

The implication of this temporal decoupling is that sound complexity is no longer bound

by space constraints (the number of musicians one can fit on a stage, or the number of

inputs on a mixing board, for example) but by limitations of time (and/or expense). The

composer now has the capability, as Varèse envisioned, to “emit sounds of any number

of frequencies,” and yet at the same time now grapples with a continual “compromise

between interest, cost, and work” (Matthews 1963, 555). One could argue that this problem

is being tackled with continual efficiency improvements in computer hardware. Compared

to the IBM 704 of 1957, the laptop computer of the early twenty-first century provides

a five-hundred-thousand fold increase in processing power at at fifteen-hundred times

less cost (Sandowsky 2001). Nevertheless, computer musicians seem to have a particular

penchant for using spare processing power.

With decreased costs and staggering increases in computing speed, significant addi-

tional challenges still remain. Although “any conceivable sound” is certainly possible, in

reality, the range of potential sounds is constrained by the development of appropriate

synthesis algorithms. The ability to formulate these procedures is severely limited by our

understanding of timbre and psychoacoustics (Matthews 1963). This can be classified as

“the synthesis problem.”

The control problem is a closely associated concern. Although digital synthesis “allows

composers freedom of expression at the timbral level” it also entails the “additional

burden of specifying the microstructure of sound, which was formerly the domain of the
1. Introduction 4

performing musician” (Loy 1989, 326). Because interesting musical sounds are rarely static,

nor entirely stochastic, the precise control parameters must be specified at every instant.

If this control information is not a natural result of the synthesis procedure in use, the

result is an unwieldy data explosion with separate detailed control streams for each sonic

parameter of interest.

Finally, the composer is faced with all of the traditional problems of composition

itself — matters of intention and aesthetics, as well as possible questions of melody, coun-

terpoint, rhythm, harmony, and/or form.

The following list summarizes the challenges and some possible approaches:

Challenge Possible Approaches


I. Time and Cost faster computers, higher processing density, more
efficient algorithms
II. Synthesis problem new synthesis methods, flexible software systems
III. Control problem new user interfaces (visual, gestural), computer
assisted composition, machine learning/listening
IV. Compositional problem continued exploration, theory

It is furthermore recognized that these problems are inherently interrelated. A given

composition aesthetic, one that involves indeterminacy, for example, would invite a

particular approach to the specification of control information. Particularly with regard to

the compositional “problem,” it is certain that no ultimate “solution” exists. If it did, such

a discovery would spell the end of creative artistic production and individual expression.

The act of composition must be seen as a continual search, for which each artist may be

free to define their own goals and strategies.

The chapters that follow are concerned in large part with particular approaches to the

synthesis problem, with a discussion of the resulting implications for the cost and control

problems. The motivation will always be oriented towards musical goals, which will be

addressed in the penultimate chapter.


1. Introduction 5

1.2 Synthesis Methods

Regardless of compositional approach the practitioner of computer music always faces

the same question: What sounds should I produce? Although current computer music

systems offer tremendous flexibility, the choices are constrained by current theories and

their implementations. In order to contextualize the work described in subsequent chapters,

it may prove instructive to consider a particular taxonomy of digital synthesis techniques

developed by Smith (1992). Table 1.1 maintains the original categories while adding some

more recent methods. A significant development in the past decade is the emergence of

hybrid approaches.

Time Domain Spectral Models Physical Models Abstract Methods


Methods (Frequency Domain)
Sampling Additive Waveguide FM
Concrète Phase Vocoder Finite Element AM
Granular Sinusoidal (MQ/PARSHL) (CORDIS-ANIMA) Waveshaping
Scanned Synthesis SMS (Smith & Serra) Karplus-Strong Analog simulation
Subtractive Modal (VCO, VCA, VCF)
Source-filter (LPC) Chaotic systems
UPIC (Xenakis)
FOF/FOG/PSOLA
Wavetable
Particles (Roads)
Concatenative
Audio Mosaicing (MPEG-7)
Physically Informed Sonic Modeling (PhISM)

Table 1.1: Taxonomy of current digital synthesis techniques (after Smith 1992).

Let us first consider some of the most commonly used synthesis methods. Since the

early 1990s, sampling synthesis has been the predominant sound production method

used in commodity hardware and software. Sampling offers very high fidelity results

(limited only by the quality of the input), but is restricted in terms of expressive and

transformational possibilities. A typical sampling synthesis architecture limits transfor-

mations to amplitude envelopes, filtering, and loop control. Commercial developers are

attacking the limitations of sampling in a brute force manner. It is not uncommon to

find multi-gigabyte sample libraries containing complex mappings, cross-faded overlays,


1. Introduction 6

and keyswitch programs. For example, a complex violin bank might include samples

for legato, marcato, spiccato, staccato, and tremolo bowings all recorded at a variety of

different dynamic levels. Fast secondary storage allows these massive sample banks to be

streamed from disk in real-time.

Granular synthesis may be viewed as a particular specialization of sampling synthesis.

The granular approach overlaps many short sound segments (generally less than 50 msec.

duration) which may be derived from recorded sources or produced via other synthetic

means. Usually, each sound segment is modulated by a corresponding amplitude envelope

that quickly fades-in, reaches a steady state, and then fades out. Granular synthesis offers

the possibility to decouple the time and frequency evolution of sound, as well as impart

particular characteristics modulating between rough, smooth, or stochastic textures.

Wavetable synthesis, one of the oldest techniques, can be viewed as a type of hybrid

approach. The time domain view concentrates on the particular wave shape of the table,

which may be either derived mathematically, drawn interactively, or (most commonly)

extracted from a recorded sample. In the frequency domain perspective, the contents of

the wavetable are specified in terms of the amplitude, frequency, and phase of a set of

harmonically related partials. With both approaches, wavetable interpolation is necessary

to produce lively and dynamic sounds.

Particle synthesis may be viewed as a hybrid extension of granular synthesis that

attempts to build up specific frequency domain content using a variety of elementary sound

types which might include grains, impulses (trainlet or pulsar synthesis), or chirps (glisson

synthesis). Convolution may be used to impart specific frequency domain characteristics

to the “pulslet” texture. For a more detailed discussion of these methods, consult Roads’s

Microsound (2001, ch. 4).

Frequency domain models are concerned first and foremost with the spectral content

of sounds. This is a particularly powerful technique because the perception of timbre

is closely related to spectral content. Auditory research shows that the ear behaves as a

sort of spectrum analyzer with precise physical positions along the basilar membrane of

the inner ear attuned to specific frequencies. Frequency domain methods may thus be
1. Introduction 7

seen as way to model synthesis on the processes of audition. Frequency domain models

usually follow an analysis/synthesis paradigm in which control parameters are derived by

analyzing pre-existing recordings.

Additive synthesis is one of the oldest and still most flexible of these frequency domain

methods. Oscillators tuned to specific frequencies, each with time varying amplitudes, are

mixed (summed) together to form a composite timbre. Typically, each oscillator produces

a sine-like waveform. In some applications the frequency and/or phase2 of each oscillator

may also be time variant. While theoretically capable of generating almost any type of

sound, additive synthesis is particularly unattractive in light of the control problem. A

complex timbre may easily require 100 individual oscillators that, in turn, each require

their own amplitude and frequency functions. This problem is typically solved by deriving

synthesis data from an analysis stage. The major advantage of additive synthesis, and

frequency domain methods in general, is that they are highly amenable to transformations.

The phase vocoder has been particularly favored for its support for independent time

and frequency modification. Andy Moorer’s Lions Are Growing, a work from 1978 that

time stretches, compresses, and transposes speech sounds,3 offers a clear demonstration

of the artistic potential of this approach. Beginning in the early 1990s, the technique

became possible on commodity computing platforms and received widespread use in

both commercial sound design and music composition. Spectral models, with particular

emphasis on sinusoidal models, will be taken up in further detail in subsequent chapters.

Concatenative synthesis, a category that encompasses a variety of specific techniques,

may be viewed as a hybrid of spectral and time domain methods. For example, attack

transients might be most effectively handled with sampling, while steady states could

be executed with a spectral model. By carefully concatenating4 sound segments derived

from a large database, more convincing and dynamic note-to-note transitions and timbral

modulations can be achieved than with sampling or spectral synthesis alone. A number of
2 The phase of any non-zero frequency oscillator is by definition time varying — what is emphasized here is
that phase may be modulated directly rather than being a result of a fixed or time varying frequency.
3 The source sound is a reading of Richard Brautigan’s poem of the same name. The work employs

LPC-style resynthesis coupled with phase vocoding.


4 Generally trying to preserve phase continuity
1. Introduction 8

recently released commercial products such as Synful (orchestral instrument synthesis)

(Lindemann 2007) and Vocaloid (singing voice synthesis) (Bonada and Serra 2007) have

demonstrated the viability of this technique. A more detailed overview of concatenative

methods may be found in Schwarz (2005).

In the early 1990s, Julius O. Smith predicted “abstract-algorithm synthesis seems

destined to diminish in importance,” (Smith 1992) and although this proved true to some

extent, a resurgent interest in classic analog and digital hardware has demonstrated that

these methods retain a strong following. As a result, considerable efforts in both the

commercial and academic sectors have gone toward the software emulation of classical

voltage controlled oscillators and filters.5 In recent years the “vintage emulation” trend

has extended to early commercial digital synthesizers, including the venerable Yamaha DX

series which has been replicated in a number of software packages.6

It seems certain that no particular technique will ever disappear entirely from the

toolbox, rather the collection will expand with the exigencies of fashion dictating the

most popular methods of the time. The flexibility and modularity of current software

systems suggests that hybrid synthesis methods will become increasingly important. The

somewhat arbitrary boundaries between approaches will blur as practitioners explore

possible combinations and concatenations. After all, sampling, wavetable, granular, and

pulsar synthesis differ solely on their operative time scales (and approach to periodicity)

and thus could be subsumed under a single methodology.

Physical modeling represents a considerably different approach to sound production.

While sampling, spectral, and abstract models are all understood in terms of their effect

on the sound consumer (the ear), physical modeling is concerned with the details of the

sound producer (the instrument) (Smith 1992). The physics of musical instruments serve

as the starting point. Generally a set of differential equations are used to describe the

possible motion (vibrations) in response to physical input (such as the motion of a bow or

the stroke of a mallet). A virtual sound pressure wave can then be computed. A significant
5 The interested reader might wish to consult Välimäki and Huovilainen (2006) as well as Stilson (2006).
6 Most notably with Native Instruments FM-7 series (now FM-8 in its 2007 incarnation)
1. Introduction 9

benefit is that a very rich and sonically convincing output can result from simple control

inputs.

Despite many compelling advantages, physical modeling has yet to become perva-

sive in computer music composition. The lack of generality in many physical modeling

implementations may be one impediment to their acceptance. Even with a pre-designed

algorithm, the parameter space may be undesirably abstract and difficult to control. For

example, in a woodwind model the output pitch may be a result of multiple inputs. Partic-

ular physical modeling methods (for example waveguides) may be limited to modeling

certain restricted classes of instruments (for more detail see Smith 2004). Finite-difference

time-domain methods (Cadoz et al. 1993) represent a more general approach to physical

modeling, although the computational demands are still somewhat prohibitive for many

systems of musical interest. It is certain, however, that with continued research and

development and inevitable increases in processing speed, both this and other physical

modeling methods will find a following. After all, one must consider that even in the

relatively short history of computer music, these methods are still fairly new.

1.3 Theory Versus Implementation

Regardless of the technique of interest, the computer music practitioner is always limited

by the tools and implementations at hand. In some cases the constraints of the tools

may prove compositionally beneficial in their circumscription of the available musical

materials. At other times, the realities of the tools (missing features, bugs, awkward

interface, sluggish operation) may prove merely frustrating. Too often one finds a lack of

available implementations of some of the most compelling synthesis and signal processing

techniques described in the literature. In such situations computer music composers may

find it necessary to build their own tools — be it in software or hardware. Computer

music is an inherently multi-disciplinary endeavor involving researchers, mathematicians,

programmers, engineers, theoreticians, composers, improvisers, and performers, and in

most cases one individual may be closely tied to a number of these disciplines.
1. Introduction 10

Over the past several decades, many such individuals have devoted considerable

attention to the area of spectral modeling. The proliferation of various spectral modeling

implementations attests to its appeal. Some of the technique’s advantages (noted above)

are especially attractive to composers — in particular the possibility to explore timbre

in a manner that is directly related to real-world experience (unlike abstract algorithm

synthesis). The transformative potential of spectral modeling synthesis (largely unavailable

with sampling) opens up a vast space between purely mimetic or purely synthetic sounds.

Sinusoidal modeling has a proven track record of high quality resynthesis while offering

numerous possibilities for novel sonic transformations (Pampin 2004). The technique

(which will be described in detail in section 2.4) derives a detailed additive synthesis

model from an input sampled sound signal. Something that closely resembles the original

input sound (a resynthesis) can be generated by adding together a sufficient number of

individual sinusoidal waves. In most cases the resynthesis will not be identical to the

original sound, although it is possible to get very close and, with certain extensions to the

basic technique, it is possible to recreate the original exactly.

1.3.1 Implementation History

A complete implementation of sinusoidal modeling was first detailed in paper by Robert J.

McAulay and Thomas F. Quatieri (1986), who were working at MIT on the analysis

and synthesis of speech signals. The basic technique is often referred to as the MQ

method. This work was closely followed by the development of the PARSHL system for

analysis-resynthesis of musical tones by Julius O. Smith and Xavier Serra (1987) at Stanford

University. Serra extended this into the “Spectral Modeling Synthesis” (SMS) method

which added residual noise modeling to the basic sinusoidal analysis (Serra 1989). At the

same time Robert C. Maher (1989), working with James Beauchamp at the University of

Illinois, was using sinusoidal modeling for polyphonic voice separation. Much of Maher’s

code was subsequently integrated into the SNDAN suite (Beauchamp 1993).
1. Introduction 11

Maher’s work also served as the basis for Lemur by Kelly Fitz, Lippold Haken, and

Bryan Holloway (Fitz et al. 1995). Lemur, which ran on the Macintosh, was one of the

first graphical user interfaces (GUIs) available on commodity hardware for interactively

viewing and editing sinusoidal models. Its release was closely followed by the availability

of other GUI editors including SMSTools from Xavier Serra (running on Windows), InSpect

and ReSpect from Sylvain Marchand and Roberty Strandh (running on UNIX/X Windows),

and a number of tools from IRCAM including Diphone (Rodet and Lefèvre 1997), Sview,

and XSpect. Sadly, Lemur was never updated for newer MacOS versions. The original

version of SMSTools was discontinued and then relaunched as part of the cross-platform

CLAM project (Amatriain et al. 2002). Some of the IRCAM tools were made commercially

available as part of the IRCAM Forum software suite. In 1999, Juan Pampin developed the

ATS system which implemented sinusoidal and noise modeling in Common Lisp. A GUI

editor, ATSH by Oscar Pablo Di Liscia and Pete Moss soon followed.

1.3.2 Present Work

These packages are important predecessors of the software that will be detailed in this

document: SPEAR (Sinusoidal Partial Editing Analysis and Resynthesis). SPEAR was

created out of a desire for software that was faster, more flexible, and easier to install and

use than other options.

The following goals were kept in mind throughout the design and development phases:

editing should be as fast and as easy to understand as in a time domain waveform ed-

itor; listening to transformations should be possible with no intermediate synthesis or

processing stage; and high quality analyses should require only a few parameter set-

tings. Additionally, it was desired to have interoperability with other analysis-resynthesis

software by providing both SDIF (Wright et al. 1999a) and native format data exchange.

In order to offer a familiar and comfortable interface, SPEAR was written to run using

the native graphics of the host operating system. Development was founded on the

principle that the interface should be as close as possible to that of a first-class native
1. Introduction 12

application, with little or no evidence of compatibility or GUI emulation layers. Portability

is made possible using wxWidgets (http://www.wxwidgets.org), a C++ library that allows

the software to be compiled for MacOS, Windows, and GTK Linux (Zeitlin 2001). SPEAR

was first demonstrated at the Columbia University Computer Music Center on February

23, 2004. As of this writing, builds have been created MacOS 9, MacOS X, and Windows.

The following chapters will detail the design and evolution of this software. Chapter

2 will discuss the basic techniques of spectral and sinusoidal modeling, while chapter 3

will focus specifically on sinusoidal modeling and its implementation in SPEAR. Chapter

4 details the user interface design and implementation, and chapter 5 explores some

composition applications of sinusoidal modeling.


13

2. Spectral Modeling

2.1 The Frequency Domain

Spectral modeling is concerned primarily with the frequency domain content of audio

signals, as opposed to the time domain content. In the time domain, an audio signal may

be viewed as a function of time, x (t), with instantaneous amplitude as the function’s

range. Typically such a function is graphed with time on the abscissa (horizontal axis)

and amplitude on the ordinate (vertical axis), which results in the familiar waveform

representation. In the frequency domain, the same signal is represented as a new function,

X ( f ), where f corresponds to frequency. Given any frequency value f k , the value X ( f k )

can be used to determine relative strength of that frequency in the given signal.1 The

function X ( f ) defines the spectrum of the sound.

2.1.1 Sinusoids

In order to study frequency domain representations in more detail, the notion of frequency

must be made more explicit. Any function x (t) that has a precisely repeating shape is said

to be periodic. In musical applications, and signal processing applications in general, the

periodic functions of greatest interest are the trigonometric functions sine and cosine. A

sine wave signal with a frequency of f hertz is given by the following: x (t) = sin(2π f t)

(where t is in seconds).

Because of their relation to the unit circle, the sine and cosine functions have a period-

icity of 2π. Furthermore, the cosine function may be viewed as a phase shifted form of the
1 As we will define it, X ( f k ) is a complex value, so it does not directly give the amplitude of component
with frequency f k . However it is straightforward to derive it from the complex value.
2. Spectral Modeling 14

sine function. We use the term sinusoid to denote any any signal that can be described by a

function x (t) of the following form:

x (t) = A sin(2π f t + φ) (2.1)

where A = amplitude, f = frequency (in hertz), t = time (in seconds), and φ = initial phase
1
(in radians). Note that this function is periodic with a period of seconds.
f

2.1.2 The Fourier Transform

The Fourier transform provides an explicit method for converting from the time domain

to the frequency domain. The following defines the Fourier transform of x (t):

Z∞
X( f ) = x (t)e−i2π f t dt (2.2)
−∞

A complete examination of the Fourier transform is beyond the scope of this document,

but we shall consider the transform with an eye toward an intuitive and pragmatic

understanding.

Recall that x (t) represents a given time domain signal, and that evaluating the integral

for any desired frequency value f allows us to find the contribution of frequency value f in

x (t). In order to understand how frequency is defined, we must explore more closely the

inner portion of the integral, e−i2π f t . By definition, e is the base of the natural logarithm

(= 2.7182818 . . .) and i is the imaginary number −1. Euler’s relation is a remarkable

mathematical identity that relates these constants to the sine and cosine functions:

eiθ = cos θ + i sin θ (2.3)

Note that cos θ + i sin θ is a complex value (it has both real and imaginary parts). The

expression e−i2π f t is a complex exponential that defines a complex sinusoid with frequency f .

More generally, a complex sinusoid is given by the following (compare with the definition

of a real sinusoid in equation 2.1):

x (t) = Aei(2π f t+φ) (2.4)


2. Spectral Modeling 15

Complex sinusoids are particularly useful because they are able to encode phase shifts in

a manner that is compact and easy to manipulate (Puckette 2007, 189). For example, given

the complex sinusoid of equation 2.4, the properties of exponents allow us to encode both

the amplitude and the initial phase shift φ as a single complex amplitude (also known as a

phasor). Any complex amplitude A0 can take the following form (for some constant phase

value φ):

A0 = Aeiφ (2.5)

and our complex sinusoid now simplifies as follows:

x (t) = Aei(2π f t+φ)

= Aei2π f t eiφ

= Aei(2π f t+φ)

= A0 ei2π f t

Other important results of Euler’s formula are the following relations:

eiθ − e−iθ eiθ + e−iθ


sin θ = cos θ =
2i 2

This tells us that real valued sinusoids are in fact a sum of two complex sinusoids, one

with positive frequency and the other with negative frequency. Complex sinusoids are

therefore simpler to manipulate because they have only a single frequency (either positive

or negative).

Given Euler’s identity, we may now re-express the Fourier transform as:

Z∞
X( f ) = x (t)(cos 2π f t − i sin 2π f t)dt (2.6)
−∞

To gain some intuitive sense of what this implies, consider the evaluation of X ( f ) for some

value f k . Informally, we may consider cos 2π f k t and sin 2π f k t to be “reference” or “basis”

sinusoids (each with freqeuncy f k ) that are multiplied by the time domain signal. The

multiplication of x (t) with one of the reference function results in a new function of time,

which can be plotted as a curve. We can think of the integral as giving the total area under
2. Spectral Modeling 16

the curve. The area indicates the amount of the reference sinusoid present in the signal.

X ( f k ) is in fact a complex amplitude that encodes amplitude and initial phase (as shown

in equation 2.5).

2.1.3 The Discrete Fourier Transform

In the given definition of the Fourier transform, the limits of integration are ±∞, implying

signals of infinite duration. In real-world situations, we work with audio signals that, by

necessity, are of finite duration and are discretely sampled. Rather than working with the

continuous Fourier transform, we can calculate a discretely sampled frequency spectrum

using the discrete Fourier transform (DFT). Given a sampled signal x (n) with a duration

of N samples, the DFT of x (n) is:


N −1

k
X (k) = x (n)e−i2π N n k = 0, 1, 2, . . . , N − 1 (2.7)
n =0
N −1     
k k
= ∑ x (n) cos 2π n − i sin 2π n
N N
(2.8)
n =0

Note that frequency is defined in terms of k, which takes on values from 0 to N − 1. We

can think of k as measuring the number of sinusoidal cycles per N samples. As a concrete

example, consider a signal that is sampled 44100 times per second. Let N (the length of

our signal) be 1024. If k = 1, then we have 1 cycle per 1024 samples. Converting to cycles

per second we have:

1 cycle 44100 cycles


f1 = × ≈ 43.066 Hertz (2.9)
1024 samples 1 second

Successive values of k correspond to 0 Hz, 43.07 Hz, 86.13 Hz, 129.20 Hz, etc. Thus, the

frequency spectrum is sampled every 43.07 Hz. Each value of k is said to refer to a specific

frequency bin. Note that the DFT gives us signal X (k ) that, like the time domain signal

x (n), is N samples long.

One very useful property of the Fourier transform and the DFT is that the transform

is invertible. If we have the DFT X (k ), we may recover the time domain signal x (n) as
2. Spectral Modeling 17

follows:
N −1
1

k
x (n) = X (k )ei2π N n n = 0, 1, 2, . . . , N − 1 (2.10)
N k =0

Note that the inverse DFT is very similar to the DFT, differing only in a change of sign
1
and a scaling factor of .
N
As noted above, X (k ) is complex valued. In general, the time domain and frequency

domain signals may be either real or complex valued. In audio applications the time

domain signal is typically real valued only, which yields complex values in the frequency

domain. Each complex amplitude X (k ) = Ak encodes both the amplitude and initial

phase shift φk . From our definition of complex amplitude in equation 2.5 and from Euler’s

identity (equation 2.3), we have the following:

X (k ) = Ak eiφk

= Ak cos φk + iAk sin φk

Since Ak and φk are constants, we can represent our complex amplitude as a complex

number of the form a + bi where a = Ak cos φk and b = Ak sin φk . We can visualize this as

a point ( a, b) on the complex plane. The amplitude Ak of the sinusoid is the distance from

the origin to the point ( a, b), and the phase φk is the angle of rotation about the origin. The

amplitude Ak is defined as the magnitude of the complex point.

p
Ak = | X (k )| = a2 + b2 (2.11)

The phase angle φk is given by the inverse tangent function.

b
φk = tan−1 (2.12)
a

Equations 2.11 and 2.12 give an explicit method to convert from a complex frequency bin

value to magnitude (amplitude) and phase. Note that for a real valued signal, phase is

measured relative to a cosine function. Phases may be expressed in sine phase via a phase

shift of − π2 radians.
2. Spectral Modeling 18

2.1.4 The Short-Time Fourier Transform

While almost all musical signals have spectra that vary significantly over time, many

sounds exhibit spectra that are locally stable. Such signals may be well-represented by a

series of spectral “snapshots” or “frames” that capture the average spectral content and, in

sequence, represent the time varying spectrum of the sound. Such an analysis is called a

short-time Fourier transform (STFT). Each spectral frame is computed by means of a DFT

of length N and is centered on a particular sample n. Successive frames are computed by

advancing some number of samples H (the hop size) and computing another DFT.

As noted above, the DFT is a discrete sampling of the frequency spectrum. When a

signal consists of a sinusoid with a period that is an integer factor of the DFT length N,

then the sinusoid’s frequency is precisely aligned with one of the DFT frequency bins.

Figure 2.1 shows the magnitude spectrum of a DFT (length N = 16) of a sinusoid with
N
period 4. Note that the magnitude spectrum consists of a single spike at bin 4.

N
Figure 2.1: DFT magnitude spectrum of a sinusoid with period 4, ( N = 32).

Since such precisely constructed signals are unlikely to be found in real-world digital
N
recordings, let us now consider the case of a sinusoid with a period of 4.21 samples, and

its corresponding DFT magnitude spectrum (figure 2.2). In this case, there are significant

components present in bins 0 through 16. Since the signal is N samples long, but has a

sinusoidal period of 4.21 cycles, it cuts off abruptly creating a significant discontinuity.
2. Spectral Modeling 19

N
Figure 2.2: DFT magnitude spectrum of a sinusoid with period 4.21 , ( N = 32).

N N
Figure 2.3 compares two sinusoids, one with period 4 and the other with period 4.21 . Note
N
that the sinusoid with period of 4 arrives smoothly at 0 at sample index 32.

Figure 2.3: Sinusoids of 4.21 cycles and 4.0 cycles per N samples, ( N = 32).

Intuitively, one can interpret such discontinuities as additional high frequency com-

ponents. These additional components are often described as spectral distortion or spectral

leakage. These distortions are a significant issue when one is attempting to detect multiple

sinusoids at different frequencies: it may be impossible to differentiate spectral leakage

from sinusoids. By applying a smoothing window function, h(m), in the time domain, these

edge discontinuities can be reduced and the spectral leakage attenuated. The STFT of
2. Spectral Modeling 20

length N beginning at sample n and with smoothing window h(m) is defined as follows:
N −1

k
Xn ( k ) = x (m + n)h(m)e−i2π N m (2.13)
m =0

A number of different window functions are commonly used in audio applications.

The most effective are generally bell-shaped curves. Note that the apparent absence of a

smooth window function in fact implies a rectangular window function. The rectangular

window function is defined to be 1 on the interval 0 . . . N − 1 and 0 otherwise. See Harris

(1978) and Nuttall (1981) for a complete discussion of window functions, their definitions,

and properties.

Before continuing further with our discussion of windowing, we must introduce one

other important signal processing concept. The convolution operation ∗ for two signals

x (n) and y(n) is defined as follows:


N −1
f (n) = x (n) ∗ y(n) = ∑ x (m)y(n − m) (2.14)
m =0

The convolution of x (n) and y(n) can be thought of as a new signal, f (n), where each

sample of x is multiplied by a time shifted copy of all the samples in y and the results

summed. Thus, f (n) consists of the scaled shifted sums of copies of y(n).

The convolution theorem (which we will not prove here) states that the spectrum of the

convolution of two signals is equivalent to the multiplication of the signals each expressed

in the frequency domain:

DFT [ x (n) ∗ y(n)] = DFT [ x (n)] DFT [y(n)]

= X ( n )Y ( n )

The converse is also true: the spectrum of the multiplication of two signals in the time

domain is equivalent to the convolution of the signals each expressed in the frequency

domain:

DFT [ x (n)y(n)] = DFT [ x (n)] ∗ DFT [y(n)]

= X (n) ∗ Y (n)
2. Spectral Modeling 21

By the convolution theorem, the multiplication of a signal x (n) and a window h(n) is

equivalent to their convolution in the frequency domain: DFT [ x (n)h(n)] = X (n) ∗ H (n).

Since the spectrum of a sinusoid is a single impulse, it follows that the spectrum of a

windowed sinusoid will be a scaled copy of the spectrum of the window function, centered

on the frequency of the sinusoid. Returning to the example of a sinusoid with a period of

4.21 cycles per window, we can get a sense of the effects of windowing by inspecting the

underlying continuous spectrum.2

Figure 2.4: DFT magnitude spectrum and the interpolated underlying magnitude spectrum
N
of a sinusoid with period 4.21 , ( N = 32).

The dotted curve in figure 2.4 shows the continuous underlying spectrum which

consists of a main lobe centered at the sinusoid frequency and many side lobes of lower

amplitude arranged around the main lobe. Different window functions exhibit unique

main lobe shapes and side lobe levels. The case of a sinusoid with an integer period

and a rectangular window is quite special. Figure 2.5 shows that the although there are

significant side lobes in the underlying spectrum, the DFT samples align precisely with

nulls in the continuous spectrum. The result is a single impulse in frequecy bin 4.

Examining the spectra of various window functions can provide a good sense of their

practical capabilities. Figures 2.6 through 2.9 show various window functions in both

the time domain and frequency domains. Note that as the center area of the window
2 The continuous spectrum can be approximated by zero-padding the signal and then performing the DFT.

The result is a spectrum that is oversampled by the zero-padding factor.


2. Spectral Modeling 22

Figure 2.5: DFT magnitude spectrum and the interpolated underlying magnitude spectrum
of a sinusoid with period N4 , ( N = 32).

function becomes narrower in the time domain, the main lobe of the window spectrum

becomes wider. This demonstrates an important tradeoff in the design and use of window

functions.

By using the STFT and appropriate windowing, we can successfully model many types

of musical signals and more significantly, we can use these models to effect a number of

different sonic transformations.

2.2 STFT Analysis and the Phase Vocoder

The phase vocoder is one of the most widely utilized STFT techniques for time and

frequency scale modification. The technique makes use of phase information in successive

analysis frames to make a precise estimate of the sinusoidal frequency present at a

particular analysis bin. By default, the frequency resolution of the STFT is limited by

the bin spacing, which in turn is controlled by the DFT length N. For example, as was

observed in equation 2.9, a DFT of length 1024 at a sampling rate of 44.1 kHz results in

a bin resolution of 43.07 Hz. Practical DFTs are computed by means of the fast Fourier

transform which limits N to powers of 2. Therefore, there are some significant restrictions

on the frequency resolution. Moreover, the center frequencies of each STFT bin will be

limited to harmonics of the center frequency of bin 1.


2. Spectral Modeling 23

Figure 2.6: Time domain plot and magnitude spectrum of the rectangular window.

Figure 2.7: Time domain plot and magnitude spectrum of the triangular window (N = 31).

Figure 2.8: Time domain plot and magnitude spectrum of the Hann window.

Figure 2.9: Time domain plot and magnitude spectrum of the Blackman window.
2. Spectral Modeling 24

Purely harmonic sounds may be analyzed by resampling the signal in the time domain

so that the length of the fundamental period is an integer factor of the DFT length N

(Beauchamp 1993). This assures that each harmonic in the input will be aligned with the

center of an analysis bin. However, this does not accommodate inharmonic sounds, nor

does it allow the measurement of frequency deviations (including vibrato) that are present

in almost all real-world acoustic signals.

2.2.1 Putting Phase to Good Use

Thus far we have been looking at the magnitude spectrum, but each analysis frame also

has a phase spectrum and this information is useful. Since frequency is defined as the

amount of phase change per time unit, the change in phase between successive analysis

frames can be used directly to estimate frequency. We define the phase deviation for

frequency bin k and analysis frame n as follows:


 
2πk∆ f
∆φk,n = φk,n − φk,n−1 + H (2.15)
fs

where φk,n is the phase for bin k, frame n, H is the analysis hop size in samples, ∆ f is the

frequency bin spacing in hertz, and f s is the sampling rate. Equation 2.15 expresses the
2πk∆ f
idea of phase unwrapping, where the term φk,n−1 + H fs represents the unwrapped

phase of a sinusoid with frequency centered precisely on analysis bin k. Positive phase

deviation ∆φk,n indicates a sinusoid with a frequency higher than the bin center frequency,

negative phase deviation indicates lower frequency.

In practice, the phase deviations must be re-wrapped to the half-open interval [−π, π )

which constrains the allowable frequency deviation and locates the estimated frequency

around analysis bin center frequency. If Θφk,n represents the re-wrapped phase deviation,

then the estimated frequency (in hertz) for bin k and analysis frame n is

Θφk,n
  
fs
f k,n = + k∆ f (2.16)
2π H
2. Spectral Modeling 25

Note that decreasing the hop size increases the range of possible frequency deviation. Small

hop sizes will allow the phase vocoder to more closely track the rapid phase deviations of

non-stationary (noisy) spectral components.

For quasi-stationary (sinusoidal) components, we note that there is an important

relationship between the choice of window function and hop size. As mentioned earlier, a

single sinusoidal component will result in a spectrum that is a scaled translation of the

spectrum of the analysis window. As such, we desire that all bins in the main lobe of

the window should contribute to the energy of the sinusoid, i.e. their frequencies should

“lock” to the frequency at the center of the main lobe. For a main lobe width of W bins and
M
window size of M, a hop size H such that H ≤ W will be sufficient (Puckette 1995). Thus,

the wider the main lobe, the smaller the allowable hop size. This also makes intuitive sense

from a time domain perspective. Since wider main lobes result from narrower bell-like

window shapes in the time domain, the decreased width of the window shape must be

compensated for by more dense sampling in the time domain.

2.2.2 Resynthesis and Transformation

Following analysis, the phase vocoder data can be used to perform a resynthesis of the

original sound. Depending on the desired result, the analysis data may be modified to

effect different sonic transformations. Two different resynthesis methods are in common

use.

The first is the STFT overlap-add technique. With this method, the inverse DFT is

applied to the data in each analysis frame. Since the analysis hop size Ha is typically less

than the DFT length N, a time domain overlapping of each DFT window results. Provided

that the analysis window function sums to unity upon overlap (which can be assured

by choosing the appropriate window function and normalization factor), the original

signal can be recovered exactly. Setting the hop size so that Hs 6= Ha gives a time warped

resynthesis.
2. Spectral Modeling 26

The second resynthesis method is the summing oscillator bank. Each bin of the

phase vocoder data is associated with a sinusoidal oscillator and each phase vocoder

frame provides frequency, amplitude, and phase information that can be used to drive

the oscillators. Since each analysis frame is separated by a hop of Ha samples, linear

interpolation of frequency and amplitude is often used to fill in intermediate samples. With

linearly interpolated frequency, the phase function becomes the integral of instantaneous

interpolated frequency. As such, the resynthesized phases cannot be guaranteed to match

the original phases, and thus resynthesis with the oscillator bank will not recover the

original signal. In practice, different frequency interpolation strategies can be used to

achieve fairly high fidelity results. For example, the cubic phase interpolation method

commonly used in sinusoidal modeling can be used to match phase at each frame (see

section 3.3.4 for details).

Typically one is interested in transforming the analysis data in order to achieve one or

more of the following results:

1. frequency filtering (boosting or cutting certain frequency bands)

2. time expansion or compression

3. transposition (frequency scaling)

Of these three, frequency filtering is the most straightforward, and can easily be imple-

mented by selectively scaling the amplitudes in the desired analysis bins. Time varying

filtering may be used in applications such as broadband noise reduction.

2.2.3 Time Expansion/Compression

Time expansion/compression may be achieved in the overlap add case by adjusting Hs


Hs
and/or Ha . The expansion ratio is given by α = . When performing time expansion or
Ha
compression, care must be taken to properly update the phases in each synthesis frame.

Let φk0 (n) denote the resynthesis phase for bin k in frame n. Phase is propagated forward
2. Spectral Modeling 27

from each previous frame according to the analysis frequencies as follows:

φk0 (0) = αφk (0) (2.17)


f k (n)2π
φk0 (n) = φk0 (n − 1) + Hs (2.18)
fs

Note that the initial phase is multiplied by the expansion ratio α. In their 1997 study,

Laroche and Dolson show that scaling the initial phases will preserve the original phase

relationships when expanding by an integer factor.

Because time expansion/compression no longer guarantees the same phase relation-

ships as the original sound, an additional smoothing synthesis window function is gen-

erally applied after each inverse DFT. This serves to reduce the effect of any phase or

amplitude discontinuities that result from frequency domain transformations. Since the

application of a synthesis window amplitude modulates the signal, proper care must be

taken to assure enough overlap to reduce amplitude modulation artifacts.

To explore this issue in more detail, it will prove useful to examine the amplitude

behavior of several analysis and synthesis windows under different overlap-add conditions.
N
First let us define the overlap factor β = . When β = 1 there is no overlap, and the
Hs
output consists of a signal that is amplitude modulated by the product of the synthesis

window with the analysis window. (The original analysis window was recovered from

the inverse DFT although the shape may be distorted due to operations performed in

the frequency domain.) Figure 2.10 shows the overlap-adding of Blackman analysis and

synthesis windows with β values of 1, 3, 4, and 6. Note that un-modulated gain is achieved

when β = 6. In general, windows with a narrower time domain peak will require more

overlap.

Additional steps must be taken to assure proper amplitude normalization. For a

synthesis window of w0 of length N, the normalization factor γ is


N −1
γ = Hs ∑ w0 (i ) (2.19)
i =0
2. Spectral Modeling 28

Figure 2.10: Overlap-adding the product of Blackman analysis and synthesis windows
with varying overlap factors (β).

Time varying expansion and contraction factors imply a variable synthesis hop size

with a corresponding variable normalization factor. Due to the possibility of amplitude

modulation artifacts, care must be taken not to increase Hs beyond the point of constant

gain. In practice it is best to fix Hs and vary Ha (or linearly interpolate frequency and

amplitude from the analysis spectrum) to achieve the desired expansion ratio.

2.2.4 Transposition

Transposition (frequency scaling) with the phase vocoder may be handled in a number of

different ways. A simple, though computationally expensive, approach synthesizes each

phase vocoder band with a sinusoidal oscillator. Frequency and amplitude may be linearly

interpolated from frame to frame. If F̂k (n) and Âk (n) are the piecewise interpolated

frequency and amplitude functions for bin k, then the resynthesis is


N/2−1  

x (n) = ∑ Âk (n) cos nP F̂k (n)
fs
+ φk (2.20)
k =0

where P is the transposition ratio, and φk is the initial phase of bin k.


2. Spectral Modeling 29

A less computationally expensive approach is to resynthesize with the overlap-add

method with a time expansion/compression ratio that is the inverse of the transposition

ratio. The transposition is then achieved via resampling in the time domain. Another

approach is to scale the spectrum in the frequency domain as described in Laroche and

Dolson (1999). The spectral magnitude peaks are scaled according to the transposition

ratio, and the phases are adjusted to reflect the new transposed frequencies.

2.2.5 Refinements

A number of additional refinements can be added to the basic phase vocoder technique.

One drawback that has been observed with time expansion/compression is the problem of

“phasiness.” This can best be described as a loss of presence in the synthesized signal; the

result is less crisp and there is sense of greater reverberation. Much of this can be attributed

to a lack of phase coherence between adjacent frequency bins. Although the standard

phase vocoder technique interpolates coherent phase between adjacent analysis frames

(“horizontal” phase), it does nothing to maintain “vertical” phase alignment. Puckette

(1995) proposed a method of phase locking that proceeds by first finding amplitude peaks

in each frame. Under the assumption that frequency bins adjacent to the peak contribute

to the same sinusoid, the phases of adjacent bins are set equal to the phase of the peak bin.

Laroche and Dolson (1997) describe a number of different phase maintenance strategies (all

based on this peak detection method) that can provide significant improvements in sound

quality. Phase relationships are particularly important for preserving transient and micro-

transient events. Robel (2003) describes a technique for reinitializing phases at transients

that significantly reduces phasiness and transient smearing under time scale modifications.

Although the phase vocoder has its limitations, when carefully implemented it can offer

excellent fidelity.
2. Spectral Modeling 30

2.3 Spectral Envelopes

Up to this point we have been discussing the precise frequency domain content of audio

signals. It has been shown that the ear functions similarly to Fourier analysis, with specific

locations along the basilar membrane of the inner-ear corresponding to specific frequencies.

However, in addition to frequency content, we are also simultaneously sensitive to the

overall general shape of the spectrum. We call the curve that defines the shape of the

magnitude spectrum the spectral envelope. The dotted curve in figure 2.11 shows one

possible spectral envelope for the given spectrum.

Figure 2.11: Spectral envelope of a soprano voice.

Many vocal and instrumental sounds can be understood in terms of a source-filter model,

where the resonant body of the instrument or of the vocal tract shape (filter) the input (the

source). In voiced human speech, the source is provided by the periodic vibration of the

vocal folds. The spectral envelope can be taken as a model of the resonances of the vocal

tract. For vowels, the peaks in the spectral envelope trace the formants. For any particular

speaker and vowel, the spectral envelope stays fixed regardless of pitch.

If a sound is transposed using the phase vocoder, for example, the spectral envelope

shape will change along with the transposed spectrum. For an instrumental or vocal
2. Spectral Modeling 31

sound, the perceived result is a change in size of the resonating body. Thus there is

a change in timbre as well as in pitch. The timbre can be preserved by dividing out

the spectral envelope prior to transposition (this process of flattening the spectrum is

sometimes referred to as whitening), and then filtering the transposed magnitude spectrum

by the original spectral envelope.

Common methods for determining the spectral envelope include linear predictive

coding (LPC), cepstrum, discrete cepstrum, and true envelope methods (Robel and Rodet

2005). For a thorough discussion of spectral envelope estimation methods and applications

see Schwarz (1998).

2.4 Sinusoidal Modeling

Like the phase vocoder, sinusoidal modeling is an STFT based analysis/resynthesis

technique. It attempts to represent an audio signal as a generalized collection of sinusoids

with time varying frequency, amplitude, and phase. Unlike the phase vocoder, the number

of sinusoids present at a given time is not fixed by the DFT size. As a consequence, the

sinusoids need not necessarily be centered around a fixed frequency band, and sinusoids

may vary arbitrarily in frequency. The following plots of a frequency sweep (chirp) first

analyzed with the phase vocoder (figure 2.12) and then with sinusoidal modeling (figure

2.13), clearly illustrate the difference between the two approaches.

The phase vocoder represents the chirp with many frequency bands that each lock

to the chirp frequency as it sweeps upward. The result is highly overspecified, with

many bands per sinusoidal feature. Manipulation of the chirp (transposition for example)

requires the adjustment of many frequency bands. As a corollary, a local adjustment to

any one frequency band will degrade the integrity of the frequency sweep.

In contrast, a sinusoidal model represents the chirp as a single breakpoint function

with time varying frequency. The chirp can be freely transposed and reshaped by the

manipulation of a single frequency/amplitude trajectory. Although in most cases perfect


2. Spectral Modeling 32

Figure 2.12: Phase vocoder analysis of a frequency sweep from 100 Hz to 500 Hz.

Figure 2.13: Sinusoidal modeling analysis of a frequency sweep from 100 Hz to 500 Hz.

reconstruction is not possible with a purely sinusoidal model, the benefits of the technique

may outweigh this drawback.

2.4.1 STFT Frequency Interpolation

The basic method of sinusoidal modeling begins with a STFT. As observed in section

2.1.4, the short-time spectrum of a signal is the convolution of the spectrum of the window

function with the spectrum of signal x (n). If the signal in question is in fact a single

sinusoid at frequency f k , then its magnitude spectrum is a single impulse peak at frequency

f k , and its windowed magnitude spectrum is simply the spectrum of the window function
2. Spectral Modeling 33

with its main-lobe centered at frequency f k . In the DFT magnitude spectrum, a sinusoid

appears as a local maximum near the true maximum of the underlying continuous

spectrum. Parabolic interpolation may then be used to approximate the position of the

peak of the main lobe.

estimated peak
β γ
α

Figure 2.14: Estimating a peak in the spectrum based on a parabolic curve fitting.

Given the DFT X (k ), the magnitude spectrum can be searched for bins that are local

maxima. Bin n is considered a local maximum if its magnitude is greater than its two

neighboring bins, or more precisely | X (k n−1 )| < | X (k n )| > | X (k n+1 )|. We can then fit

a parabola that passes through the magnitude values of bins n − 1, n, and n + 1. The

parabola takes the following form:

y ( x ) = a ( x − p )2 + b

The known points on the curve are y(−1), y(0), and y(1) and are given by the the bin

magnitudes (measured in dB). According to Smith and Serra (1987), frequency estimates

are found to be more accurate when interpolating using dB rather than linear magnitude.

y(−1) = α = 20 log10 | X (k n−1 )|

y(0) = β = 20 log10 | X (k n )|

y(1) = γ = 20 log10 | X (k n+1 )|


2. Spectral Modeling 34

With three equations and three unknowns (a, p, and b) we can solve for the parabola peak

location p.
α−γ
 
1
p= (2.21)
2 α − 2β + γ
The center frequency (measured in bins) will be n + p. The parabola peak location p can

then be used to solve y( p) for both the real and imaginary parts of the complex spectrum.

For the real part, we set α< , β < , and γ< to the real values at bins n − 1, n, and n + 1,

α< = <e[ X (k n−1 )]

β < = <e[ X (k n )]

γ< = <e[ X (k n+1 )]

and for the imaginary part

α= = =m[ X (k n−1 )]

β = = =m[ X (k n )]

γ= = =m[ X (k n+1 )]

Solving y( p) for the real part a and imaginary part b gives

1
a = y ( p ) < = β < − p ( α < − γ< )
4
1
b = y ( p ) = = β = − p ( α = − γ= )
4

According to equations 2.11 and 2.12 the magnitude wil be a2 + b2 and the phase will be
b
tan−1 .
a
This frequency, phase, and magnitude interpolation method means that the analysis

procedure can accurately detect sinusoidal frequencies that are non-integer multiples

of the analysis bin spacing frequency. This makes it ideal for analyzing sounds that

have inharmonic components, have variable frequency content, or exhibit limited non-

stationary behavior (a sum of different harmonic complex tones, sounds with wide vibrato

or glissandi, etc.)
2. Spectral Modeling 35

2.4.2 Partial Tracking

Once all of the sinusoidal peaks in a particular STFT frame have been detected, the

peaks must be organized such that they form continuous, time-varying sinusoidal tracks

(partials). This is accomplished by matching the peaks in the current frame with peaks

present in previous STFT frames. During resynthesis the frequency, amplitude, and phase

can be smoothly interpolated from a peak to its matched successor in the next frame.

k0 k1 k2 k3 k4 k5

unmatched peak
frequency

track end (death)

track start (birth) frequency threshold

time

Figure 2.15: Connecting STFT peaks into sinusoidal tracks.

Typically, some peaks that fall below a certain amplitude threshold will be discarded

prior to matching. This means that each frame may contain different numbers of peaks and

that, over time, individual partials may vanish or appear depending on their amplitude.

Amplitude thresholding techniques and tradeoffs will be discussed in more detail in the

next chapter. The major implication for this type of sinusoidal model is that the number

of sinusoids present at any one instant is variable. By eliminating redundancies and low

amplitude components, the result is a model that is efficient to store and manipulate.

Several different techniques have been proposed for partial tracking. The simplest

method is a locally optimal greedy algorithm that matches each peak to the peak in

the next frame that is closest in frequency. Typically there is a distance threshold that

constrains the maximum allowable frequency jump. Each peak is defined in terms of the

following data structure:


2. Spectral Modeling 36

structure Peak
Freq
Amp
Phase
Forwardmatch
Backmatch
end structure

Note that peaks keep track of both their successor (Forwardmatch) and predecessor

(Backmatch). Forwardmatch and Backmatch are initialized to Nil, indicating the absence

of a matching peak. Listing 2.1 illustrates the greedy peak matching algorithm.

1: for prevpeak ∈ Pk−1 do


2: for curpeak ∈ Pk do
3: distance ← | prevpeak.Freq − curpeak.Freq|
4: if distance < f req_threshold then
5: if curpeak.Backmatch 6= Nil then
6: existing_distance ← |curpeak.Backmatch.Freq − curpeak.Freq|
7: else
8: existing_distance ← f req_threshold
9: end if
10: if distance < existing_distance then
11: curpeak.Backmatch.Forwardmatch ← Nil
12: curpeak.Backmatch ← prevpeak
13: prevpeak.Forwardmatch ← curpeak
14: end if
15: end if
16: end for
17: end for
Algorithm 2.1: Greedy peak matching from frame k − 1 to frame k.

Pk is the collection of peaks in frame k (the current frame) and Pk−1 is the collection of

peaks in the previous frame. The outer loop iterates through each of the peaks in frame

k − 1. The inner loop looks for the best match in the current frame by iterating through

each of frame k’s peaks. If the frequency distance between any two peaks is less than

distance_threshold (line 4), then it is a candidate for matching. Before connecting the peaks,

the algorithm checks to see if the candidate peak in frame k + 1 has already been claimed

(line 5). If so, the new distance must be less than distance of the current match (line 10).
2. Spectral Modeling 37

Following matching, the Forwardmatch and Backmatch fields indicate the complete

sinusoidal tracks. If Backmatch = Nil, a new partial is starting up. If Forwardmatch = Nil,

a partial is ending. When a partial begins or ends, a short amplitude fade is usually added

in order to avoid a discontinuity.

Given that the greedy algorithm has no explicit model of the underlying sound

structure, nor any context beyond the peaks of the current and previous frames, it actually

performs reasonably well in practice. However, the greedy algorithm can be easily confused

by spurious peaks or significant frequency deviations such as vibrato or glissandi (see

figure 2.16). A number of minor refinements can be made, such as incorporating amplitude

into the measure of peak proximity or measuring frequency distance on a logarithmic

scale.

k0 k1 k2 k3 k4 k5

partial tracking conflicts


frequency

time

Figure 2.16: Partial tracking conflicts due to glissandi.

2.4.3 Improved Partial Tracking

A number of other improved partial tracking strategies have been proposed. Under the

assumption that the sound being analyzed contains relatively stable frequency trajectories

(such as stable harmonic or inharmonic partials of a single fundamental), peaks can be

matched to a set of frequency guides (Serra 1989). Guide based partial tracking proceeds

as follows: at frame k we have a set of N peaks Pk with frequencies f 0 , f 1 , f 2 . . . f N −1 ,

and a set of M frequency guides G with frequencies f 00 , f 10 , f 20 . . . f M


0
−1 . As in the greedy
2. Spectral Modeling 38

algorithm, each guide attempts to claim the peak in Pk that is closest in frequency to the

guide frequency (| f m0 − f n | is minimized for all combinations, subject to constraints on

the maximum allowable frequency jump). When a match is found between partial n and

guide m, the partial trajectory is continued and a new guide frequency fˆm is computed

according to

fˆm = α( f n − f m0 ) + f m0 for α ∈ [0, 1]

The parameter α is a simple lowpass filter on the guide frequency. When α is zero, the

guide frequencies remain constant. As α approaches 1, the guide frequencies more closely

track the frequency of the evolving partial. Guides that fail to find a match are set to a

dormant state. They can “wake up” in future frames if they find a peak appropriately

proximate in frequency. If a guide remains dormant for too long, it is removed from G.

Peaks that are not matched to a guide may (optionally) spawn new guides. The number of

maximum allowable guides may be limited, if desired.

For harmonic sound, the guide frequencies may be fixed to multiples of the fundamen-

tal frequency f 0 . In this case the number of guides can remain constant throughout the

evolution of the sound. Note that the tracking method works well only if the fundamental

frequency estimation is reliable. Guide based tracking has two distinct advantages: it can

fill temporal gaps in the frequency trajectory, and it can provide a clean representation

for harmonic sounds (with one partial per harmonic). Guide based tracking has been

implemented in a number of sinusoidal modeling packages including SMS, Loris, and

ATS.

For sounds with non-stationary behavior (noise, microtransients, rapid frequency

fluctuations, etc.) guide based tracking offers few advantages in comparison to the basic

greedy algorithm. Depalle, García, and Rodet proposed a tracking method based on

a combinatorial Hidden Markov Model (HMM) that provides more globally optimal

frequency trajectories (Depalle et al. 1993). A Hidden Markov Model consists of a set

of related states, observations, and parameters. In a typical HMM problem, the goal is

to determine a temporal sequence of states based solely on observations that have some
2. Spectral Modeling 39

probabilistic connection to the states. The states are the “hidden” part of the model,

and the parameters determine the probability of a state (or state transition) given certain

observations.

In the application of HMMs to partial tracking, a state consists of a set of connections

from frame k − 1 to k. The goal is to find the most likely sequence of states (connections)

given the observations. In this case the observation is simply the number of peaks present

in frame k − 1 and in frame k. If we let Sk model the set of peak connections from frame

k − 1 to k then next state, Sk+1 will specify the connections from frame k to k + 1. Thus,

a state transition involves the peak trajectories of three frames: k − 1, k, and k + 1. The

parameters of the HMM specify the probability of a transition from state to state. The

parameters are based on a cost function that favors peak connections with continuous

frequency slopes. Note that actual frequency distance is not considered. For this reason

the HMM method can be used to track crossing partials. Given the parameters of the

model (the probability of a particular state transition), the Viterbi algorithm is used to find

the most likely sequence of states. The optimal sequence of transitions will be that which

minimizes frequency discontinuities.

The computational cost of the HMM method is high since it must consider all com-

binations of peak connections across all frames. In practice, the number of frames (and

therefore the number of state transitions) is limited to some window of T frames. The

number of possible peak connections under consideration is further constrained by fixing

the number of partials alive during any interval of T. Other options could further constrain

the combinations, such as limiting maximum allowable frequency slopes or limiting the

number of partial crossings.

A particular weakness of the HMM method described by Depalle et al. is that it cannot

fill temporal gaps in the partial trajectories. Gaps may occur (especially in low amplitude

partials) due to bin contamination from noise, or if one is analyzing material that has

undergone lossy compression (a situation that may be common given current audio

distribution methods). Two new promising partial tracking approaches include future

trajectory exploration (Lagrange et al. 2004) and tracking using linear prediction (Lagrange
2. Spectral Modeling 40

et al. 2003). SPEAR has implemented the linear prediction method which will be detailed

in section 3.2.

2.4.4 Synthesis

The standard resynthesis method for a sinusoidal model is the summing oscillator bank.

Each frequency trajectory computed via partial tracking is represented as a breakpoint

envelope where where point k of envelope i is represented by the time, frequency, ampli-

tude, and phase values (tik , f ik , aik , φik ). Linear interpolation of amplitude and frequency

can be used to resynthesize each breakpoint segment. Fik (n) and Aik (n) are the linear

interpolation frequency and amplitude functions for the segment from breakpoint i to

breakpoint i + 1.
n k
!
f s − ti
 
Fik (n) = f ik + f ik+1 − f ik for f s tik ≤ n < f s tik+1 (2.22)
tik+1 − tik

n k
!
f s − ti
 
Aik (n) = aik + aik+1 − aik for f s tik ≤ n < f s tik+1 (2.23)
tik+1 − tik

where n is the output sample index, f s is the sampling rate, and tik is breakpoint time in

seconds. If we let F̂i (n) and Âi (n) be the piecewise amplitude and frequency functions for

partial i defined for all n, then the final resynthesis is given by


N −1  

x (n) = ∑ Âi (n) cos n F̂i (n)
fs
+ φi (2.24)
i =0

where φi is the initial phase of partial i.

Note that the instantaneous phase is the integral of the frequency function. For linear

frequency interpolation this implies a quadratic phase function. The phase values at all

but the first breakpoint are ignored. If we desire a resynthesis that is phase accurate, a

different interpolation strategy must be used. McAulay and Quatieri (1986) developed a

cubic phase interpolation method that matches frequency and phase at each breakpoint.

This will be discussed in detail in section 3.3.4.


2. Spectral Modeling 41

2.4.5 Limitations and Extensions

Sinusoidal modeling makes the assumption that the audio signal can be closely approxi-

mated as a sum of sinusoids. Although this may hold true for certain classes of signals

(harmonic sounds with clear tonal components and slowly varying frequency content)

many types of signals are not well represented by sinusoids — broadband noise or transient

events such a snare drum hit, for example. In many cases, sounds of interest will exhibit

both tonal and noisy components. For example, a flute tone will contain quite a bit of

broadband breath noise in addition to a clear fundamental.

Partial tracking becomes particularly difficult in the presence of noise, reverberation,

or dense polyphony. Recent research continues to suggest new approaches for partial

tracking, noise modeling, and transient preservation. A few of these methods have been

implemented in SPEAR and will be discussed in the next chapter.

Even with extensions to the basic sinusoidal modeling method, perfect signal recon-

struction is not guaranteed. However, when considering compositional goals, perfect

reconstruction is rarely the intent. If lossless reproduction is required, sampling synthesis

accomplishes this with far greater ease and efficiency. The power of a sinusoidal model

lies in its potential to allow independent transformations in both time and frequency. One

can begin with models that closely mimic real-world acoustic events and then, through

a series of transformations or interpolations, explore a vast space of synthetic variations.

Clearly this is fertile territory for creative musical exploration.


42

3. Sinusoidal Analysis and Resynthesis

SPEAR provides two different spectral analysis modes — the phase vocoder and sinusoidal

modeling. The phase vocoder follows the standard method described in the preceding chap-

ter. This chapter will focus on specific implementation of sinusoidal analysis/resynthesis.

Several extensions to the basic techniques, as described in the preceding chapter, will be

discussed.

3.1 Analysis Parameters

Sinusoidal analysis requires a number of input parameters including window type, window

size, DFT size, analysis hop size, amplitude thresholds, and partial tracking constraints.

All of these parameters can have a significant effect on the quality of the analysis. The

interaction of amplitude thresholds and partial tracking can be particularly complex. The

goal is to provide defaults for all of these parameters that work reasonably well for a

wide variety of musical signals. However, even with careful choices, some user-selected

parameter inputs are almost always required. Typically, the user might begin analysis

using the default settings and then adjust accordingly after viewing the analysis and

auditioning the resynthesis.

3.1.1 Analysis Window

One of the most crucial parameter choices is the length of the analysis window. This will

control width of the main lobe in the frequency domain and, as a result, will determine the

frequency resolution of the analysis. Rather than specifying the window length in samples,

the user inputs the frequency resolution f a in hertz. Adjacent sinusoids that are at least f a
3. Sinusoidal Analysis and Resynthesis 43

hertz apart can be properly resolved in the analysis. Additionally, no significant sinusoid

with a frequency less than f a hertz should be present in the input. For a harmonic sound,

f a should correspond to the frequency of the fundamental.

Given f a , we must determine the appropriate window size and DFT size. In order to

resolve sinusoids using the peak picking and quadratic interpolation method, the window

size must be large enough to insure separation of the window main lobes. The window

length M is given by
 
fs
M = ∆s (3.1)
fa
where f s is the sampling rate and ∆s is the desired window separation in bins.

SPEAR uses a Blackman window which has a main lobe width of 6 bins. If we require

complete separation of the main lobes as shown in figure 3.1, then ∆s = 6. Note that as

∆s increases, the window length grows. Since large windows introduce an undesirable

loss of temporal resolution, we would like to use a smaller value of ∆s that still allows

for the resolution of closely spaced sinusoids. Abe and Smith (2004) made an extensive

study of the minimum allowable frequency separation (MAFS) for various window types.

The MAFS is the smallest separation that allows for peak detection and does not introduce

significant bias in the peak interpolation due to interference from a neighboring peak. With

a spectral oversampling (zero padding) factor of 2, the MAFS for the Blackman window is

3.53 bins (Abe and Smith 2004, 8, table 9). This is shown in figure 3.2. SPEAR uses the

slightly more conservative separation of 4 bins. Given a window length M, the FFT size is

N = 2dlg Me+1 , which results in spectrum oversampling by a minimum factor of 2.

With a peak separation of 4 f a and spectrum oversampling by a minimum factor of

2, each main lobe peak will be sampled by at least 7 DFT bins. To aid the rejection of

spurious peaks we may conservatively impose the restriction that the three magnitudes

| X (k n−1 )|, | X (k n )|, | X (k n+1 )| of a parabolic peak must also all exceed the magnitude of
either neighboring bin: | X (k n−1 )| > | X (k n−2 )| or | X (k n+1 )| > | X (k n+2 )|.
3. Sinusoidal Analysis and Resynthesis 44

Figure 3.1: Magnitude spectrum of the main lobes of two Blackman windows with a
frequency separation ∆s = 6.

Figure 3.2: Magnitude spectrum of the main lobes of two Blackman windows with the
minimum allowable frequency separation ∆s = 3.53.

3.1.2 Amplitude Thresholds

Amplitude thresholds are used to limit the number of peaks in each analysis frame. The

desire is to detect and track only the most perceptually significant partials. Because

many sounds of interest exhibit a high frequency roll-off, it is helpful to equalize the

spectrum. This results in an analysis that gives equal weight to both high and low

frequency components and avoids resynthesis that sounds dull and heavily lowpass

filtered.
3. Sinusoidal Analysis and Resynthesis 45

Equalization can be achieved by applying a pre-emphasis high-pass filter in the time

domain (Maher 1989), or by applying an emphasis curve to the STFT data (Serra 1989).

One particular disadvantage of the applying a filter in the time domain is an alteration

of the magnitude and phase spectra, the effects of which can only be easily undone by

applying a complementary de-emphasis filter following resynthesis.

SPEAR achieves the effect of pre-emphasis by using a frequency dependent amplitude

threshold curve. SPEAR establishes two amplitude thresholds: Tb , which is a variable

threshold that must be exceeded to begin a new partial track, and Td , which is fixed lowest

threshold (default value is −90 dB). If peaks are found that exceed Td , then they may be

used to continue an existing partial track. If no appropriate track continuation is found,

the partial ends — so Td may be considered a “death” threshold for possible continuation

of a partial.

Tb is computed using a frequency dependent threshold curve, At ( f p ), in combination

with the maximum bin amplitude of the analysis frame, amax


n . Incorporating the maximum

bin amplitude allows Tb to track the overall signal level. The threshold curve is given by
   
aR aR
At ( f p ) = a T + a L + − b f p /20000 (3.2)
b−1 b−1

where f p is the frequency in hertz, b is a parameter controlling the shape of the curve, a T

is user controllable threshold in dB, a L is the amplitude offset at 0 kHz in dB, and a R is

the range of the curve in positive dB from 0 to 20 kHz. Figure 3.3 shows At ( f k ) with the

default values b = 0.0075, a T = −60, a L = 26, and a R = 32.

Given the maximum bin amplitude in dB for a frame, amax


n and the frequency f p in Hz

of the peak under consideration, the threshold Tb is given by

Tb = amax
n + At ( f p ) (3.3)

Tb constitutes a “birth” threshold for potential new partials. The birth threshold requires

low frequency peaks to exceed a greater threshold than high frequency ones. In addition

to helping find more high frequency partial tracks, this aids in the rejection of spurious low

frequency, low amplitude partial tracks. Although the high frequency emphasis afforded
3. Sinusoidal Analysis and Resynthesis 46

Figure 3.3: Frequency dependent threshold curve with the default values b = 0.0075,
a T = −60, a L = 26, and a R = 32.

by the variable threshold provides increased resynthesis fidelity for many real world

sounds, it can result in many low amplitude, high frequency partials. In many cases, the

signal energy above 5 kHz could be more efficiently and robustly modeled as noise bands,

rather than clusters of low amplitude sinusoids. Possible approaches to modeling noise

will be discussed in section 3.6.

3.2 Partial Tracking Using Linear Prediction

Peaks that are retained following thresholding are joined into breakpoint functions rep-

resenting individual sinusoidal partials. The linear prediction (LP) method of Lagrange

et al. (2003) is used to connect the peaks into tracks. Linear prediction treats each evolving

sinusoidal track as a signal. The current signal sample, x (n), is approximated as a linear

combination of past samples:


K
x̂ (n) = ∑ a(k) x (n − k) (3.4)
k =1

There are K linear prediction coefficients, a(k ), that are calculated by minimizing the error

between the predicted value, x̂ (n), and the actual value x (n). The K coefficients can then

be used to predict successive values x̂ (n + 1), x̂ (n + 2), etc. Several different algorithms,
3. Sinusoidal Analysis and Resynthesis 47

including the Burg method, autocorrelation, and covariance can be used to compute a(k ).

Lagrange et al. show that the Burg method is most suitable in this application.

LP can be used to compute possible future values for both frequency and amplitude.

The frequencies and amplitudes of newly detected peaks are compared to the predicted

values, and the closest matches are used to extend the sinusoidal tracks. The method is

implemented as follows: at frame k we have detected N candidate peaks and we have

M active sinusoidal tracks which extend to at most frame k − 1. For each of the M

active sinusoidal tracks, the Burg method is used to compute two sets of linear prediction

coefficients, one on frequency and one on amplitude. An LP model of order 6 computed

from a maximum of 64 previous values has worked well in practice. It is critical that

enough points are used to capture periodic features such as amplitude modulation or

vibrato. For a more thorough discussion of the choice of LP parameters, see Lagrange et al.

(2007).
pr pr
The LP coefficients for track m are used to predict frequency f m and amplitude am

values for frame k. When a new sinusoidal track starts and there are not enough values to

compute the LP coefficients, the mean frequency and amplitude are used as the predicted

values. The error between predicted values for track m and peak n are given by a measure

of Euclidean distance
s 2 2
f nobs aobs
  
n
Em,n = 12 lg pr + α 20 log10 pr (3.5)
fm am

where f nobs and aobs


n are the observed frequency and amplitude values for peak n. The first

term measures distance in semitones and the second in dB with a scale factor α. Informal
1
tests have shown α = 12 to offer a reasonable weighting between prediction errors in

frequency versus those in amplitude.

Each track m selects the best continuation peak n over all N peaks subject to constraints

on the maximum allowable difference in predicted versus actual frequency. More precisely,
pr
| f nobs − f n | < ∆ f max where ∆ f max is proportional to the analysis frequency f a . For harmonic
sounds, ∆ f max = 3
4 fa is a reasonable default value. ∆ f max is a user-controllable parameter
3. Sinusoidal Analysis and Resynthesis 48

and can be reduced for sounds with very stable partials or increased for sounds with

partials that exhibit wide frequency excursions.

As with guide based tracking methods, linear prediction partial tracking can span

temporal gaps in the analysis data. Tracks that fail to find a match either become inactive

or lie dormant for several successive frames. In the case of a track that has been dormant

for j frames, linear prediction is used to predict the j + 1th values for matching to candidate

peaks.

3.3 Resynthesis

SPEAR offers several different synthesis methods. For real-time playback, the inverse

FFT method (abbreviated as IFFT or FFT−1 ) offers excellent efficiency and very good

quality (Rodet and Depalle 1992). Although most synthesis artifacts of the standard IFFT

method can be minimized with the use of appropriate overlap-add windows, extremely

rapid modulations of frequency or amplitude may not be adequately reproduced. For the

highest quality sound, SPEAR can perform oscillator bank synthesis with optional cubic

phase interpolation as in the classical McAulay-Quatieri method. SPEAR also supports the

resynthesis of Loris RBEP files which include a noise component for each breakpoint (Fitz

1999). These so-called bandwidth enhanced partials can be synthesized with either the

IFFT method or noise modulated oscillators.

3.3.1 Inverse FFT Synthesis

Since the DFT is invertible, it is possible to synthesize any desired time domain waveform

by constructing the complex spectrum of the desired signal and applying the inverse DFT.

In this case we are interested in the synthesis of sinusoids which, as detailed in section

2.1.4, have well understood spectra. A time domain windowed sinusoid can be synthesized

by the following procedure:

1. shift the complex spectrum of the window function so that it is centered on the bin

frequency of the desired sinusoidal frequency


3. Sinusoidal Analysis and Resynthesis 49

2. scale the spectrum of the window function according to the desired complex ampli-

tude (magnitude and phase)

3. accumulate the scaled shifted window function into the DFT buffer

4. perform the inverse DFT

Steps 1–3 correspond to convolving the window spectrum with the spectrum of a sinusoid

(a line spectrum). Because convolution is a linear operation, steps 1–3 can be performed

for each sinusoid to be synthesized. The inverse DFT can then computed as a final step.

The final DFT only adds a constant amount of work regardless of the number of sinusoids.

Provided that steps 1–3 can be carried out efficiently, we have the basis for an algorithm to

quickly synthesize a sum of many sinusoids.

Step 1 can be performed efficiently by pre-computing an oversampled spectrum of the

window function. For a DFT/IDFT of length N, we first compute a time domain window

function of h(n) of length N. The window function is then zero padded (zeros appended)

so that it has length M = N × R, where R is the oversampling factor (R is typically chosen

to be a power of 2). Call this zero padded window h0 (n).



 h(n) for 0 ≤ n < N

0
h (n) = (3.6)
 0 for n < 0 or n ≥ N

The DFT of the zero padded window gives the oversampled interpolated spectrum of the

window H (k ).
N −1

k
H (k) = h0 (n)e−i2π N n k = 0, 1, 2, . . . , N − 1 (3.7)
n =0

The window spectrum can then be shifted to any fractional bin position that is a multiple
1 1
of . For bin frequencies that are not a multiple of , the window value may be
R R
approximated by index truncation, index rounding, linear interpolation, or some higher

order interpolation scheme.

Steps 2–3 — scaling and accumulating the complex spectrum of the window func-

tion — can be performed with various degrees of efficiency depending on the precision

required. To precisely synthesize the desired sinusoid, all N values of the window spec-
3. Sinusoidal Analysis and Resynthesis 50

trum must be scaled and accumulated into the DFT buffer. Equation 3.8 summarizes

the shifting, scaling, and accumulation of the window spectrum W (n) for a sinusoid of

amplitude An and bin frequency of bn . Note that to synthesize a real valued-sinusoid,

we must represent half of the amplitude as a complex sinusoid, and half as its complex

conjugate.

An An
X (k) = H (b(k − bn ) Rc)eiφ + H (b( N − k − bn ) Rc)e−iφ (3.8)
2 2

The symmetry of the DFT buffer means that X (k ) can be constructed by first accumulating

N/2 values in the positive frequency half (0 ≤ n ≤ N/2), and then reflecting their complex

conjugate in the negative frequency half (N > n > N/2).

The procedure implies one table lookup (for the window function), and one complex
N
multiply (four scaler multiplies) for each of the 2 + 1 values. Compared to synthesis via a
table lookup oscillator, this method is actually less efficient. The oscillator requires one

table lookup and one scaler multiply for each of the N samples. However, significant

savings can be achieved if we approximate the scaled and shifted window function

spectrum.

Although the complete window function spectrum has length N, it should be noted

that many of the values are close to zero. Therefore, we can approximate steps 2–3 by

scaling and shifting only J window samples immediately surrounding the main lobe

(typically between 7–15 samples). Laroche (2000) calls the restricted set of samples the

“spectral motif.” Accumulating only the spectral motif (rather than the entire window

spectrum) reduces the number of table lookups and complex multiplies to a constant J,

regardless of the DFT buffer size.

Although any window function can be used, for computational efficiency we desire a

window function with a reasonably narrow mainlobe and low sidelobes. When the DFT

buffer is filled, only the values of the main lobe and the first few side lobes need to be

computed.

The result of the inverse DFT is a sum of sinusoids (each with constant amplitude

and frequency) that have been multiplied by the time domain window function h(n).
3. Sinusoidal Analysis and Resynthesis 51

To synthesize time varying amplitudes and frequencies, an overlap-add process can be


1
used. We begin by multiplying each sample in the DFT buffer by h(n)
. This effectively

removes the time domain effect of the window, leaving a constant amplitude sinusoid. We

then apply a triangular synthesis window s(n) which gives the desired linear amplitude

interpolation when overlap-added together.

Since frequency is constant across each DFT buffer, there will be some modulation

artifacts resulting from overlap-adding. These effects can be reduced by carefully matching

the phase of each sinusoid from one buffer to the next. For an overlap factor of 2, the

phase should be matched at the midpoint of the triangular slope (Rodet and Depalle 1992).

An important implementation detail of the IFFT method concerns the imprecision

introduced by the approximation of the window spectrum. By only using J samples

centered on the main lobe, the window spectrum has been band-limited. Effectively the

window spectrum has been multiplied by a rectangular window. Since multiplication in

the frequency domain results in convolution in the time domain, the time domain window

is no longer limited to N samples. The result is time domain aliasing which introduces

distortion in the resynthesized sinusoid. This distortion is particularly evident at the edges

of the time domain buffer where the window function approaches its minimum. When
1
the window function is divided out (multiplied by h(n)
), these distortions are dramatically

amplified. It is particularly evident for window functions that taper to zero at the edges

(windows in the Blackman-Harris family, for example). Figure 3.4 compares a sinusoid

with IFFT synthesized versions using different values of J. Note that even with large values

of J, significant error is still present. To avoid this problem, some number of samples D at

the beginning and end of the DFT buffer must be discarded. D is typically on the order of
N
, so the reduction in synthesis efficiency is negligible.
8
Inverse FFT synthesis appears to offer some open areas for continued research. More

detailed studies of signal distortion in inverse FFT synthesis would be welcome. It would

be helpful to have a more precise analysis of the effects of different synthesis windows,

window oversampling and interpolation strategies, the choice of J (controlling the size
3. Sinusoidal Analysis and Resynthesis 52

Figure 3.4: Comparison of sinusoids generated via IFFT synthesis with different values of
J (the spectral motif length) showing the first 16 samples of the time domain signal where
N = 64, frequency = 1.5 cycles per N samples. Window type is Blackman-Harris.

of the spectral motif), and the effect of D on signal to noise ratio. Jean Laroche’s (2000)

investigations provide some important preliminary insights in this area.

3.3.2 IFFT Synthesis Implementation


N − 2D
With an overlap add factor of 2, each inverse DFT computes new output samples.
2
N
In the current implementation, SPEAR performs IFFT synthesis with N = 512, D = .
8
These parameters allowed for real-time resynthesis of hundreds of partials on relatively

modest hardware.1 Future versions of SPEAR may allow the user to fine tune the resyn-

thesis parameters. In particular, it may be desirable to reduce N. Because frequency and

amplitude are constant across each DFT buffer, rapid frequency changes may not be well

represented with larger N.

In the current implementation, the spectral motif length, J, is fixed at 9. Because of

the problem of distortion at edge of the DFT buffer, the Hamming window, as defined

in equation 3.9, was chosen for the synthesis window. Since we divide by the window

function, windows that taper to zero will amplify distortions rather dramatically as the
1 PowerPC G3 class processor at 400 MHz — circa 2003
3. Sinusoidal Analysis and Resynthesis 53

inverse tends toward infinity. The minimum value of the Hamming window at the buffer
2
edge is .
23  
25 21 2πn

 46 − 46 cos N − 1 for 0 ≤ n < N



h(n) = (3.9)


for n < 0 or n ≥ N

 0

Real time resynthesis proceeds according to a time counter t that indicates the current

location in the analysis data. Whenever a new output buffer is required, the synthesis

callback checks to value of t to determine the set of sinusoids that need to be synthesized.

The center of each DFT synthesis window is represented by t. Pn is the set of partials

active at time tn . The amplitude and frequency for each partial in Pn is determined by

linear interpolation. At time tn+1 , we have a new set of partials Pn+1 . All partials in
T
Pn Pn+1 must have phase continuity at the overlap-add midpoint. This is achieved by

maintaining a set of virtual oscillators that store the previous phases. Each oscillator

structure is associated with an active partial by means of a hash table (Cormen et al. 1990).

Thus, it is reasonably efficient to lookup phase as partials turn on and off. Moreover t

can proceed at any rate through the analysis data. This facilitates the ability to manually

“scrub“ forward and backward through the sound. The analysis data can also be freely

transformed concurrent with resynthesis. SPEAR does appropriate data locking to make

sure the analysis data structures remain consistent between the concurrent resynthesis and

user interface threads.

3.3.3 Oscillator Bank Resynthesis

SPEAR can optionally perform a non-realtime oscillator bank resynthesis with linear

interpolation of frequency and amplitude as shown in equation 2.24 (page 40). This

provides a much higher fidelity result than the IFFT method.


3. Sinusoidal Analysis and Resynthesis 54

3.3.4 Phase Management

As noted in section 2.4.4, linear interpolation of frequency will, in general, only retain

the phase of the first breakpoint of a partial. Although in many situations the phase

information may not be perceptually important, there are cases where temporal phase

coherence between partials is quite important, for example in the reproduction of transients.

We also seem to be particularly attuned to the phase relationships in speech signals.

McAulay and Quatieri (1986) developed a frequency/phase interpolation method that

maintains phase and frequency at each breakpoint. Consider two successive breakpoints i

and i + 1 with phases φi and φi+1 and frequencies ωi and ωi+1 (here we are expressing

frequency in radians per sample rather than cycles per second). Since frequency is the

derivative of phase, we desire a phase interpolation function with endpoints equal to φi

and φi+1 and first derivatives at the endpoints equal to ωi and ωi+1 . Independent control

of the slope at two points requires a cubic function, so we begin by defining a cubic phase

of the form

θ (n) = ζ + γn + αn2 + βn3 for n = 0 . . . S (3.10)

where n is the sample index and S is the number of samples from breakpoint i to breakpoint

i + 1. The first derivative (the frequency function) is

θ 0 (n) = γ + 2αn + 3βn2 for n = 0 . . . S (3.11)

The endpoints at sample 0 and sample S are

θ (0) = ζ = φi (3.12)

θ 0 (0) = γ = ωi (3.13)

θ (S) = ζ + γS + αS2 + βS3 = φi+1 (3.14)

θ 0 (S) = γ + 2αS + 3βS2 = ωi+1 (3.15)

With four equations we can solve for the four unknowns. ζ is equal to the initial phase, and

γ is equal to the initial frequency, which leaves only to solve for α and β. One complication
3. Sinusoidal Analysis and Resynthesis 55

9π 9π

φi+1 + 8π M=4
7π 7π

θ(n) = φi + ωin + α(M)n2 + β(M)n3 φi+1 + 6π M=3


5π 5π
phase

φi+1 + 4π M=2
3π 3π

φi+1 + 2π M=1
φi π π

0 slope = ωi+1
slope = ωi
M=0
-π φi+1

n=0 n=S
sample index

Figure 3.5: Family of cubic phase interpolation functions with different values of M (after
McAulay and Quatieri (1986)).

is that the final phase is measured modulo 2π, so in fact θ (S) should be expressed as

θ (S) = ζ + γS + αS2 + βS3 = φi+1 + 2πM (3.16)

for some integer M. M defines a family of cubic phase functions with different amounts

of phase unwrapping (see figure 3.5).

The solutions for α and β are parameterized in terms of M as follows:

3 1
α( M) = (φi+1 − φi − ωi S + 2πM) − (ωi+1 − ωi ) (3.17)
S2 S
−2 1
β( M) = 2 (φi+1 − φi − ωi S + 2πM) + 2 (ωi+1 − ωi ) (3.18)
S S

McAulay and Quatieri show that the optional choice of M is that which makes the phase

curve maximally smooth (the maximally smooth choice for figure 3.5 is M = 2). This can

be defined in terms of minimizing the second derivative of θ (n) with respect to sample
3. Sinusoidal Analysis and Resynthesis 56

index. The result is that the maximally smooth M is the closest integer to x, where
 
1 S
x= (φi + ωi S − φi+1 ) + (ωi+1 − ωi ) (3.19)
2π 2

The complete phase function is

θ (n) = φi + ωi n + α( M)n2 + β( M)n3 for n = 0 . . . S (3.20)

It is important to note that cubic phase interpolation will generally only work for unmodified

resynthesis. Partials that have undergone time dilation, transposition, or frequency shifting

cannot be synthesized reliably with this method since the resulting phase functions are no

longer “smooth” enough to avoid frequency modulation artifacts. Methods for preserving

phase under modifications remain an interesting open problem. It may prove effective to

resynchronize phase only at certain moments (as suggested by the work in Robel (2003)

and Fitz and Haken (2002)), or slightly modify the breakpoint frequencies to smooth out

the phase function. Phase interpolation in the unmodified case is important for recovering

the residual signal, which is discussed in section 3.6.

3.4 Data Storage

A popular data storage model in analysis-synthesis applications is a sorted list of time

frames. Each frame contains a list of peaks (amplitude, frequency, and phase) that are

connected to successive time frames via either index numbers or pointers.

Rather than represent the analysis data as a sorted list of time frames, SPEAR uses

a list of partials that are represented by breakpoint functions of time versus amplitude,

frequency, and phase. With this storage model the implementation of cut, copy, and paste

operations, of both entire partials or segments, is straightforward. Additionally, partials

can be individually shifted, stretched, or compressed in time without resampling to fixed

frame time points. A further advantage is that the storage model can easily support

multirate analysis data (Levine 1998) or time reassigned breakpoints (Fitz 1999).
3. Sinusoidal Analysis and Resynthesis 57

3.4.1 Time-span Frames

A common operation, particularly during synthesis, is determining the list of partials (and

breakpoint segments) crossing a particular time point. In the frame based storage model,

a binary search of sorted frames quickly determines the partials that are active at any

particular time.

With a list of breakpoint partials, this query requires an iteration through all the

partials in the data set. For a short sound, this may only require looking at several hundred

elements, but for longer sounds there are likely to be many partials of short duration.

For example, the total number of partials for a one minute sound could be well over

twenty-thousand. During synthesis, continued iteration over thousands of partials is too

inefficient. A solution is to maintain a parallel data structure of time-span frames. For

each frame, a list is kept of all partials that have breakpoints within the frame’s time

span — typically on the order of 0.125 seconds. Figure 3.6 shows an example of ten partials

segmented into five frames where each frame contains a list of its active partials. The

active partial lists maintain a relatively constant size as partials turn on and off. For a

typical sound, this reduces active partial lookups to an iteration over only several hundred

elements.

3
5 7 8
2
9
4

0 6

t0 t1 t2 t3 t4 t5
0, 1, 2, 0, 1, 2, 0, 1, 3, 0, 1, 3, 8 0, 1, 8, 9
3, 4 3, 4, 5 4, 6, 7

Figure 3.6: Data structure for division of partials into time-span frames. Comma delimited
lists for each frame indicate the active partials.
3. Sinusoidal Analysis and Resynthesis 58

3.5 Transient Sharpening

As observed in section 3.1.1, the choice of window length directly affects both the frequency

and temporal resolution of the analysis. For polyphonic or inharmonic sounds in particular,

we desire a long analysis window so as to be able to resolve closely spaced partials.

However, as the window length increases, the temporal resolution decreases. This is a

particular problem for sounds that contain transients. All signal energy is spread across

the window and assigned to a breakpoint located at the temporal center of the analysis

window rather than at the precise location of the transient. Moreover, since analysis

windows overlap, the energy will be distributed across multiple windows and assigned to

multiple breakpoints. This results in the familiar pre- and post-echo effects of traditional

STFT analysis/synthesis.

Fitz (1999) shows how the method of reassignment can be used to improve the temporal

resolution and reduce pre- and post-echo. The method of reassignment begins by noting

that traditional STFT analysis assigns all energy included in the window to the temporal

center of the window. When there is significant energy in a particular frequency band that

is located near the edge of a window, this energy should instead be assigned to the center

of gravity (time-frequency centroid) of the window. This time-frequency coordinate can be

computed from the derivatives of the phase spectrum. The time-frequency centroid will

be located at the point where phase is changing most slowly (where the partial derivative

of phase with respect to time and phase with respect to frequency is zero).

Auger and Flandrin (1995) and Plante et al. (1998) show how to efficiently compute the

point of reassignment as a ratio of Fourier transforms. The reassigned time for frequency

bin k is given by ( )
Xht (k) Xh (k)
t(k) = t + <e (3.21)
| Xh (k)|2
where t is the original center of the analysis window (in samples). Xh (k ) is the DFT of

the signal using the usual analysis window h(n). Xht (k) is the DFT of the signal using a

special time-weighted analysis window ht(n). ht(n) is defined as the product of h(n) and
3. Sinusoidal Analysis and Resynthesis 59

a ramp function from − M2−1 to M −1


2 , where M is the window length.

M−1
 
ht(n) = h(n) n − n = 0, 1, 2, . . . , M − 1 (3.22)
2

The real portion of this ratio represents the time reassignment in samples.

The reassigned frequency for bin k is given by


( )
Xhω (k ) Xh (k )
ω ( k ) = ωk − = m (3.23)
| Xw (k)|2

where ωk is the bin frequency, and Xhω (k ) is the DFT of the signal using a frequency-

weighted analysis window hω (n). hω (n) can be computed by taking the DFT of h(n),

multiplying by a frequency ramp, and then taking the inverse DFT. Note that the time-

frequency center of gravity is computed independently for each frequency bin.2

Pre- and post-echo can be dramatically reduced by removing breakpoints with large

time corrections (Fitz 1999; Fitz and Haken 2002). Specifically, if the time correction is

larger than analysis hop size (and the hop size is less than the window length — which is

always the case in this implementation), a breakpoint can be safely removed since it will

be detected in a neighboring frame. Figures 3.7 and 3.8 show the analysis of a castanet

click that occurs at about 0.868 seconds.

Note that the transient event is well localized, and pre- and post-echo are dramatically

reduced. In order to avoid a complete amplitude discontinuity at the onset of a partial,

a zero amplitude breakpoint is joined to the head of each partial. Informal observations

indicate that although the reassignment method does an excellent job of localizing sharp

transients, there is some loss of energy, likely due to the removal of the breakpoints with

large time corrections. The addition of a zero amplitude breakpoint creates a short fade

into each transient which has the additional benefit of raising the average signal level

around the transient event. Further experimentation should be conducted to determine

whether transient breakpoints would benefit from an additional amplitude boost.


2 Note that equations 3.21 and 3.23 differ from those in Auger and Flandrin (1995), Plante et al. (1998), and

Fitz (1999) in that we add the reassignment term for time reassignment and subtract for frequency reassignment.
The sign depends on whether or not the STFT is defined with a time reversed window. We use the definition in
3. Sinusoidal Analysis and Resynthesis 60

Figure 3.7: Breakpoints from the analysis of a castanet click using the standard sinusoidal
model.

Figure 3.8: Breakpoints from the analysis of a castanet click using the time reassignment
method. Breakpoints with large time corrections have been removed.
3. Sinusoidal Analysis and Resynthesis 61

3.6 Noise Modeling

Serra and Smith were the first to develop extensions to sinusoidal modeling for representing

noise. The SMS (Spectral Modeling Synthesis) system offers two different ways to model

non-sinusoidal components. In the residual model, the sound is first analyzed using

sinusoidal modeling. Next, the signal is resynthesized using the oscillator bank method

with cubic phase interpolation as described in section 3.3.4:


N −1
x̂ (n) = ∑ Âi (n)cos(θ̂i (n)) (3.24)
i =0

The residual signal e(n) is then computed by subtracting the resynthesized signal x̂ (n)

from the original x (n).

e(n) = x (n) − x̂ (n) (3.25)

Because phase is preserved in the resynthesis, this subtraction can take place in the time

domain. The residual e(n) will typically consist of noise, attack transients, and pre- and

post-echo. The original signal can of course be perfectly reconstructed by adding the

residual to the resynthesis. However, typically we wish to make some sonic transformations

prior to resynthesis. In this case a more flexible modeling of the residual is required.

The stochastic model represents the residual in the frequency domain. For each frame l

of the analysis, we compute a residual magnitude spectrum from the magnitude spectrum

of the original signal and the magnitude spectrum of the phase accurate resynthesis.

| El (k)| = | Xl (k)| − | X̂l (k)| (3.26)

Xl (k ) is the DFT of the original signal at frame l and X̂l (k ) is the DFT of the sinusoidal

resynthesis. Assuming that e(n) is a stochastic signal, it can be modeled as a set of

magnitude spectra with random phase. Time stretching or transposition of the residual

can be accomplished by interpolating new magnitude spectra | Êl (k )|, setting the phase

equation 2.13 (page 20) in which the window is not time reversed. Auger and Flandrin and Plante et al. define
their STFT with a time reversed window. Thanks to Kelly Fitz for explaining the reason for this difference.
3. Sinusoidal Analysis and Resynthesis 62

spectra to random values between −π and π, applying the inverse DFT, and then overlap

adding.

Optionally | El (k )| can be computed directly in the frequency domain by subtracting the

frequency domain peaks detected in frame l from | Xl (k )|. | El (k )| may also be approximated

as a spectral envelope. The spectral envelope can be viewed as a filter that shapes a flat

noise spectrum.

Several variants of the deterministic plus stochastic model have been proposed. Pampin

(2004) developed the ATS system which separates the energy in El (k ) into 25 bands; each

band corresponding to a step on the Bark frequency scale. The noise energy in each band

may then be optionally redistributed among the partials.

3.6.1 Bandwidth Enhanced Sinusoids

This approach is similar to the bandwidth enhanced additive sound model described in

Fitz (1999) and implemented in Loris. Bandwidth enhanced synthesis models the signal

as a sum of bandwidth enhanced oscillators — sinusoidal oscillators that are amplitude

modulated by a lowpass filtered noise signal χ(n).


 
2πF (n)
y(n) = [ A(n) + β(n)χ(n)] cos n (3.27)
fs

Equation 3.27 defines a bandwidth enhanced oscillator with time varying amplitude A(n)

and time varying frequency F (n). The amplitude of the lowpass filtered noise is controlled

by β(n). Multiplication of a sinusoid by noise in the time domain results in the convolution

of the noise spectrum with the sinusoid spectrum. The multiplication shifts the noise

spectrum so that it is centered at frequency F (n).

In Loris, each breakpoint has an additional noise parameter κ that varies between 0

and 1. If Â(n) is the local average partial energy, the the bandwidth enhanced oscillator is

defined by
h√ √ i 
2πF (n)

y(n) = Â(n) 1−κ+ 2κχ(n) cos n (3.28)
fs
3. Sinusoidal Analysis and Resynthesis 63

When κ = 0, y(n) is a pure sinusoid. As κ increases, the amplitude of the narrowband

noise also increases. This model has the advantage that noise is tightly coupled to the

sinusoidal partials. As the sound is edited and manipulated — for example as partials are

transposed — the noise will follow. It is sometimes convenient that the noise amplitude is

controlled by a single parameter κ. However since the noise parameter is coupled to overall

partial amplitude, several square root operations must be invoked, which is somewhat

inefficient for realtime resynthesis.

SPEAR currently does not implement noise analysis, but it will resynthesize noise of

partials imported from Loris SDIF files (see section 4.6 for more information about the

use of SDIF). Loris defines the lowpass filtered noise signal as white noise filtered by a

3rd order Chebychev IIR filter with cutoff of 500 Hz, and SPEAR attempts to match this

behavior. Although the Loris noise representation is compact (based on a single parameter

κ), it lacks generality and is not an ideal choice for cross program data exchange. A more

flexible system would specify noise amplitude and bandwidth independently of sinusoidal

amplitude. Such a data model would better support different noise analysis strategies and

would allow for experimentation with different noise bandwidths. Expanding the number

of noise parameters would require either a new SDIF matrix type or an extension of an

existing type. Expanding the noise handling capabilities of SPEAR is an important area

for future work.


64

4. Editing and Transformation

4.1 Graphical User Interface

One of the motivations behind the development of SPEAR was the desire for a graphical

interface that would allow the user to make a quick visual assessment of analysis quality.

In spite of the best efforts in the choice and implementation of the analysis algorithms,

sonic artifacts are common. Even with a reduced set of available analysis parameters,

some trial and error is usually necessary to determine the optimal settings for a particular

application.

With systems that are command-line oriented (Loris, ATS, SNDAN), the user is left

to either resynthesize and listen carefully for artifacts, or use an additional plotting or

visualization tool. SPEAR offers an intuitive interface that allows the user to quickly zoom

in on analysis artifacts. Problematic areas of the sound can be isolated and auditioned,

and the data can be modified as desired. For example, spurious sinusoidal tracks can be

deleted, attenuated, or joined with other neighboring tracks.

The analysis data is displayed with time on the abscissa and frequency on the ordinate.

As in a grayscale spectrogram, varying levels of shading indicate the relative amplitudes

of partials. Figure 4.1 illustrates the overall user interface. Following the model of time

domain waveform editors, the amount of detail shown for the breakpoint functions varies

according to a user controlled zoom factor. At high zoom levels, every breakpoint is shown

and clickable handles allow individual manipulation in time and frequency (figure 4.2).

At lower zoom levels, the decrease in detail avoids visual clutter and significantly speeds

redraw.
4. Editing and Transformation 65

Figure 4.1: Zoomed-out display showing lasso selection and playback control sliders.

Figure 4.2: Zoomed-in display showing amplitude adjustment dialog.


4. Editing and Transformation 66

point/partial selection scroll

selection time region select frequency region

rectangular selection lasso selection

frequency shift transpose selection

time/frequency shift time shift selection

time stretch/compress playback scrub

draw partial

Figure 4.3: The SPEAR tool palette.

A tool palette (figure 4.3) offers different modes for graphical selection and manipula-

tion. Interaction follows a selection and direct manipulation paradigm (Shneiderman 1983).

For example, clicking on a partial selects it and shift-clicking adds partials to the current

selection. Choosing the transpose tool then allows the selected partials to be transposed by

dragging up or down. Additional selection modes allow the user to sweep out areas in

time and frequency or to draw arbitrary time-frequency regions with a “lasso.” Following

the model of graphics editors, arbitrary selections can be made up of a union or difference

of individual contiguous selections.

Editing tools and commands that operate on the current selection include transposition,

frequency shifting, time shifting, time stretching, and amplitude adjustment. Unlimited

undo/redo is supported for all editing operations, and most editing can be performed

while the sound is being synthesized. For further real-time control, sliders allow adjustment

of the overall transposition level, amplitude, and playback speed (figure 4.4). New partials

can be added by drawing with the pencil tool. New breakpoints are added with amplitude

proportional to the dragging speed.

Figure 4.4: The SPEAR playback control sliders.


4. Editing and Transformation 67

4.2 Selection Model

During the development of SPEAR, two possible selection models and implementations

were considered. The first may be termed the “region selection“ method. In this model, a

selection is represented as a polygon (concave or convex) in the time-frequency plane. (It

would also be possible to represent the selection in the time-amplitude plane if desired.)

Selections would be defined by sweeping out regions graphically or by setting numeric

parameters. More complex selections could be represented as the union of several regions.

The second selection model is “point selection“ method. In this case, the selection is

represented as a collection of references to the specific breakpoints of interest.

Given the desire to allow both extremely detailed editing and complete flexibility in

the temporal positioning of breakpoints, it was decided that the point selection model was

best. To reduce implementation and interface complexity, this is the only selection model

that has been implemented to date.

There are a few drawbacks to this model. In some cases it may be desirable to define a

transformation (such as an amplitude envelope) that occurs across a certain time span. In

this case, the region selection model is ideal. When the user requests such an operation

on a point selection (for example, an amplitude fade), the bounding region of the point

selection must first be computed. It may also be desirable to define a time-frequency

region (such as a formant) that remains fixed when partials are shifted or transposed. In

this case, the region selection model is required.

Point selections are implemented by associating a selection list with each breakpoint

array. The selection is stored efficiently as a range set data structure (Phillips 1999). Rather

than maintaining the index of each breakpoint, the range set represents contiguous runs

of breakpoints by the start and ending points of the run. More formally, if n breakpoints

starting at index i are selected, the range set represents this selection as the pair (i, i + n),

rather than the list [i, i + 1, i + 2, . . . , i + n − 1]. Discontiguous selections are represented as
 
an ordered list of pairs: (i0 , i0 + n0 ), (i1 , i1 + n1 ), . . . , (i j , i j + n j ) . Given that contiguous

selections are the most common case, the range set is very efficient in both time and
4. Editing and Transformation 68

space requirements. The range set also supports operations to efficiently add and coalesce

ranges, as well as delete and split ranges.

4.2.1 Rule-based Selection

Selections can also be made programatically. For example, the user can choose to select all

partials with a duration less than some specified value. Or, the user can choose to select

partials with an average amplitude less than some desired level. Currently the number of

choices for rule-based selection is quite limited, however one can imagine any number of

selection criteria: minimum or maximum frequency, average frequency, frequency range,

frequency periodicity (vibrato rate), phase stability, amplitude range, maximum amplitude,

etc. Comparison (=, >, <) and logical operations (NOT, AND, OR, XOR) could be used to

develop any number of rule-based selection criteria. The extent of possibilities suggests

that a scripting language (rather than a GUI) would be best way to implement flexible

rule-based selection. It is hoped that future versions of SPEAR will include this capability.

4.3 Editing Features

The SPEAR document model supports cut, copy, and paste of any selected data. When

pasting data from one document to another, care must be taken to make sure data types

match. Breakpoints are required to have time, frequency, and amplitude components. They

may also have phase and noise components, but these are optional (for a discussion of

the noise component, see section 3.6.1, page 62). Analyses performed with SPEAR will

have at least time, frequency, amplitude, and phase. Since analysis data may be imported

from other software and via different file formats (see section 4.6), the document model

must be prepared to support different breakpoint types. When pasting into an existing

document, the pasted data is modified to match the destination document’s breakpoint

type. For example, when pasting data that has only time, frequency, and amplitude into a

document that also has phase, the newly pasted data will have phases set to zero.
4. Editing and Transformation 69

In future versions of SPEAR it may prove beneficial to allow greater flexibility of point

types. Additional data types could specify specify stereo panning, 3D spatial location,

noise bandwidth, etc. Partials are currently implemented as multi-dimensional arrays

with contiguous allocation. A flexible point model would be better implemented with

collections of one-dimensional arrays, each array corresponding to one data type.

contiguous multi-dimensional array

collection
of
arrays

Figure 4.5: Different breakpoint storage models.

SPEAR supports unlimited undo/redo for all editing operations. Early in the design

process it was decided that the undo implementation should be simple and robust, yet

flexible enough to support arbitrary edits and transformations. Each undo/redo operation

is modeled as an ordered collection of sub-operations consisting of any of the following:

• selection

• edit partial

• insert partial

• remove partial

These sub-operations are bundled together into a single undo or redo operation. An

undo stack stores the state before the operation (this is the data required to restore the

document to its previous state), while the redo stack stores state after the operation. To

avoid excessive dynamic memory use, the data for edit, insert, and remove operations is

stored on disk. Selections are stored in locally addressable memory. Generally this does

not present a storage problem since the range map data structure is quite compact.
4. Editing and Transformation 70

A selection operation consists of the partial indices and breakpoint selection ranges.

An edit operation is defined as any operation on a partial that does not require the

insertion or removal of any other partials (and therefore does not change any partial index

numbers). Insert or remove operations are those that will alter the partial indices in the

data model. Operations that can split partials into multiple segments (such as cut or delete)

are modeled as a sequence of edit, insert, and remove operations. It is crucial that the

atomic sub-operations are ordered properly so that the precise order of partial indices is

always restored.

4.4 Time Expansion and Contraction

SPEAR supports two different modes for time expansion and contraction. The first is

the “independent“ mode. In this mode, each partial is expanded or contracted relative

to the starting time of the partial (or selection), which remains fixed in time. If tk (n)

represents the time of nth breakpoint for partial k, then the independent expansion for

breakpoint n given by the following tk (0) + α[tk (n) − tk (0)] where α is the expansion factor.

In “proportional“ expansion/contraction mode, the new time is relative to two fixed time

boundaries Tmin and Tmax . Breakpoints that fall before Tmin are left unchanged. Breakpoints

falling between Tmin and Tmax are scaled. Breakpoints the occur after Tmax are shifted

earlier (for contraction) or later (for expansion). The following equation summarizes:




 tk (n) for tk (n) ≤ Tmin

t0k (n) = Tmin + α[tk (n) − Tmin ] for tk (n) > Tmin ∧ tk (n) ≤ Tmax



 tk (n) + (α − 1) [ Tmax − Tmin ]

for tk (n) > Tmax

Proportional expansion/contraction scales the time base uniformly and accomplishes the

same thing as traditional time warping. Independent expansion/contraction can produce

interesting effects that “smear“ and overlap the evolution of each individual partial (in the

case of expansion), or emphasize short sinusoidal segments (in the case of contraction).

Figures 4.6–4.8 on page 71 illustrate the different time expansion modes.


4. Editing and Transformation 71

Figure 4.6: Original partials prior to time expansion.

Figure 4.7: Partials time expanded in independent mode. Note that the start time of each
partial is the same as in the unexpanded version.

Figure 4.8: Partials time expanded in proportional mode. Note that the start time of each
partial is scaled relative to time zero.
4. Editing and Transformation 72

4.5 Frequency Modifications

Currently the SPEAR GUI only supports a few basic frequency-based transformations.

Frequency shifting and transposition can be performed either by dragging the selection

vertically with the appropriate tool or by entering a specific frequency shift (in hertz) or

transposition (in floating point semitones). The interface is shown in figure 4.9

Figure 4.9: Dialog boxes for transposition and frequency shifting.

Frequency “flipping“ inverts all frequencies in the range Fmin to Fmax around the

axis 12 ( Fmin + Fmax ). For frequency f k (n) of partial k and breakpoint n, the new flipped

frequency f k0 (n) is given by

f k0 (n) = Fmax − ( f k (n) − Fmin )

Frequency flipping can be effective when applied to specific frequency bands. For example,

one might wish to try flipping the 3rd through 11th partials of an inharmonic timbre.

Figure 4.10 illustrates the effect. Frequency flipping tends to be less interesting when

applied to an entire instrumental sound, as the strong lower partials invert into high

frequency regions (often above 10 kHz). Strong energy in this area is often perceived

simply as a loud noise band.

4.6 Data Exchange

A central feature of SPEAR is support for data exchange with existing analysis synthesis

packages. For example, one might wish to use SPEAR’s visualization to compare the

analysis results from different software. The Sound Description Interchange Format (SDIF)

(Wright et al. 1999a) is supported both for import and export.


4. Editing and Transformation 73

(a) Selected partials of a bell sound prior to frequency flipping.

(b) Selected partials of a bell sound after frequency flipping.

Figure 4.10: Frequency flipping.

SDIF was developed out of a need for a standard file format that could represent

sounds in ways other than as a sequence of time domain samples. SDIF is particularly

well-suited to storing spectral models. An SDIF file consists of a sequence of frames that

are tagged with a 64-bit floating point time value (measured in seconds). Each frame is

also tagged with a 32-bit integer stream ID which allows multiple logical channels per

file. Each frame consists of one or more matrices that contain the data. Various standard

matrix types, which are tagged with a 4-byte code, have been defined for different types of

analysis data and are shown in table 4.1.

For SDIF import, SPEAR supports 1TRC (sinusoidal tracks), 1HRM (harmonic sinu-

soidal tracks), RBEP (reassigned bandwidth enhanced partials — a matrix type used by
4. Editing and Transformation 74

Matrix Type Description


1FQ0 Fundamental frequency estimates
1STF Short time Fourier transform data. The time tag
in a 1STF frame is the time of the center of the
window, not the beginning.
1PIC Picked spectral peaks with amplitude, frequency,
phase, and confidence values
1TRC Sinusoidal tracks with track index, amplitude,
frequency, and phase values
1HRM Sinusoidal tracks with track harmonic number,
amplitude, frequency, and phase values
1RES Exponentially decaying sinusoids/resonances
with amplitude, frequency, decay rate, and ini-
tial phase values
1TDS Time domain samples
RBEP Reassigned, bandwidth enhanced sinusoidal
tracks with track index, amplitude, frequency,
phase, time-offset, and noise values
1ENV Sampled spectral envelope
1NVT Name-value table storing text or numerical
header information (value) for arbitrary keys
(name)
EASM Matrix storing all breakpoints of single partial.
Supported by SDIF-Edit (Bresson and Agon 2004).

Table 4.1: Some of the standard SDIF matrix types.

Loris), and 1STF (short-time Fourier transform frames). Data can be exported either as

1TRC frames or RBEP frames. In the case of 1TRC frames, resampling of the breakpoint

functions must be performed to conform to the fixed frames of the 1TRC file type. For

RBEP frames, each point has a time offset value. By collating all breakpoints into frame-

sized chunks and properly setting the time offset, any distribution of breakpoints can be

losslessly represented with an RBEP SDIF file.

Additional formats are supported for importing — the .mq and .an formats from

SNDAN (Beauchamp 1993) and the .ats format from the ATS package (Pampin 2004). The

data exchange options are summarized in table 4.2.

Two custom text file formats have been implemented for both import and export.

par-text-frame-format represents resampled frames, where each line of the file con-
4. Editing and Transformation 75

Format Import Export


SDIF 1TRC × ×
SDIF RBEP × ×
SDIF 1HRM ×
SDIF 1STF ×
SNDAN .mq ×
SNDAN .pv ×
ATS .ats ×
Text frames × ×
Text partials × ×

Table 4.2: Supported file formats.

tains the data of a single time frame. Index numbers are used to link sinusoidal tracks.

par-text-partials-format represents exact breakpoint functions, where each line of the

file represents a complete partial. Each line is separated by a newline character. Both for-

mats begin with a preamble. The first line of the preamble is either par-text-frame-format

or par-text-partials-format depending on the type of file. The second line specifies the

data types for each breakpoint. Subsequent lines give data about the number of partials

and/or frames. The final line of the preamble is either frame-data or partials-data de-

pending on the type. Figures 4.11–4.12 show the frame based format and figures 4.13–4.14

show the partials based format.

Although the SPEAR graphical interface is intuitive and fairly feature complete, it

may not offer enough flexibility for more advanced uses. For example, batch analysis and

transformation of many sound files would be quite tedious. The ability to import and

export data gives motivated users the option to implement batch transformations with

other applications (see section 5.3 for details). The motivation for creating a GUI based

analysis/synthesis tool arose from the dearth of options. Existing options such as SNDAN,

ATS, Loris, and IRCAM’s Additive were all primarily command-line based applications.

The ideal environment would offer strong graphical and scripting capabilities, and it is

hoped that future versions of SPEAR will incorporate a scripting engine to facilitate a

wider variety of use cases.


par-text-frame-format
point-type index frequency amplitude
partials-count <J>
frame-count <K>
frame-data
<frame-time0> <N> <index0> <freq0> <amp0> ... <indexN> <freqN> <ampN>
<frame-time1> <N> <index0> <freq0> <amp0> ... <indexN> <freqN> <ampN>
...
<frame-timeK> <N> <index0> <freq0> <amp0> ... <indexN> <freqN> <ampN>
4. Editing and Transformation

Figure 4.11: The par-text-frame-format specification. Following the frame-data line, each line contains the breakpoints for one
frame. N indicates the number of peaks in each frame. The index values connect peaks from frame to frame. Each line is separated
by a newline character.

par-text-frame-format
point-type index frequency amplitude
partials-count 5
frame-count 7
frame-data
0.000000 5 4 79.209549 0.000271 3 199.714951 0.000551 2 350.347504 0.002547 1 540.650085 0.000262 0 708.789856 0.000477
0.010000 5 4 77.832207 0.000453 3 199.756546 0.000949 2 351.608307 0.013954 1 529.902954 0.000625 0 704.903259 0.004945
0.020000 5 4 75.407600 0.000525 3 201.211563 0.001151 2 352.697296 0.034572 1 533.938293 0.000891 0 701.399414 0.020578
0.030000 5 4 74.918587 0.000409 3 199.946182 0.001182 2 352.384399 0.051492 1 552.429321 0.001080 0 702.354309 0.042789
0.040000 5 4 75.172478 0.000174 3 198.636566 0.001109 2 352.013580 0.059633 1 564.025574 0.000642 0 704.860168 0.063039
0.050000 3 3 199.029831 0.001001 2 353.066437 0.061233 0 706.239319 0.076010
0.060000 3 3 199.642029 0.000907 2 355.612793 0.061363 0 706.990723 0.080715

Figure 4.12: Sample data as par-text-frame-format. There are a total of 5 partials and 7 frames.
76
par-text-partials-format
point-type time frequency amplitude
partials-count <J>
partials-data
<index0> <P> <start-time> <end-time>
<time0> <freq0> <amp0> ... <timeP> <freqP> <ampP>
<index1> <P> <start-time> <end-time>
<time0> <freq0> <amp0> ... <timeP> <freqP> <ampP>
...
<indexJ> <P> <start-time> <end-time>
<time0> <freq0> <amp0> ... <timeP> <freqP> <ampP>
4. Editing and Transformation

Figure 4.13: The par-text-partials-format specification. Following the partials-data line, each pair of lines contains the data
for one partial. The first line of each pair gives overall data for each partial: index number, length (P), start time, and end time. The
second line gives the breakpoints. Each line is separated by a newline character.

par-text-partials-format
point-type time frequency amplitude
partials-count 5
partials-data
0 5 0.000000 0.046440
0.000000 540.650085 0.000262 0.011610 528.172668 0.000683 0.023220 536.151062 0.000970 0.034830 564.025574 0.001158 ...
1 7 0.000000 0.069660
0.000000 350.347504 0.002547 0.011610 351.811279 0.015791 0.023220 353.037323 0.041780 0.034830 351.919281 0.058412 ...
2 7 0.000000 0.069660
0.000000 199.714951 0.000551 0.011610 199.763245 0.001013 0.023220 201.767395 0.001204 0.034830 198.648788 0.001166 ...
3 5 0.000000 0.046440
0.000000 79.209549 0.000271 0.011610 77.610458 0.000483 0.023220 74.562180 0.000542 0.034830 75.172478 0.000314 ...
4 6 0.011610 0.069660
0.011610 704.277527 0.005664 0.023220 700.294800 0.026302 0.034830 703.821472 0.054533 0.046440 706.153931 0.073634 ...

Figure 4.14: Sample data as par-text-partials-format. There are a total of 5 partials.


77
78

5. Compositional Applications

The introductory chapter outlined a number of challenges composers face when working

with the computer: the time/cost problem, the synthesis problem, the control problem,

and the compositional problem. Thus far we have seen that SPEAR most directly addresses

the time/cost problem (by providing fast synthesis) and the control problem (by providing

an agile and highly intuitive user interface). We now turn in earnest to the composition

problem. Musical tools, such as SPEAR, are never neutral agents in the compositional

process. Whether deliberate or not, musical intentions are always latent in the design of

computer music software systems. SPEAR was conceived with particular attention to the

needs of spectral composition.

5.1 Spectral Composition

Spectral composition might be seen as subset of the more general notion of timbral

composition. Many of the notable practitioners of spectral composition are quick to eschew

the idea of a specific “spectral style.” Nevertheless, developments over the past thirty

years do point to consistencies which allow us to formulate a coherent conception of

spectral music. The wariness of labels stems in part from a mistrust of compositional

orthodoxies, serialism in particular, from which many spectral composers were trying

to escape. A central tenet of spectral composition in fact centers around the rejection of

discrete categories in favor of continuums. This distinguishes the spectral approach from

other notions of timbral composition which are more oriented toward categorical and

hierarchical notions: Fred Lerdahl’s Timbral Hierarchies (1987), Pierre Schaefer’s Solfège de

l’objet sonore, or James Dashow’s Dyad System (1999).


5. Compositional Applications 79

Spectral composition can be understood from several perspectives. From an aesthetic

point of view, the notion of continuity and process is central. Tristan Murail speaks in

particular about the organization of continuous frequency space:

“Frequency space is continuous and acoustical reality only has to define its
own temperaments. If we push this reasoning to an extreme, the combination
of pure frequencies could be used to explain all past categories of musical
discourse and all future ones. Harmony, melody, counterpoint, orchestration,
etc., become outdated and are included in larger concepts.” (Murail 1984)

Gérard Grisey holds a complementary view that locates notions of continuity and process

in the temporal domain.

“For me, spectral music has a temporal origin. It was necessary at a particular
moment in our history to give form to the exploration of an extremely dilated
time and to allow the finest degree of control for the transition from one sound
to the next. . . . From its beginnings, this music has been characterized by
the hypnotic power of slowness and by a virtual obsession with continuity,
thresholds, transience and dynamic forms.” (Grisey 2000)

Joanthan Harvey’s view of spectralism stresses the potentially spiritual aspect of the

approach: “History seems grand, for once; spectralism is a moment of fundamental shift

after which thinking about music can never be quite the same again. . . . spectralism

in its simplest form as color-thinking, is a spiritual breakthrough.” Later he comments

specifically:

“In several works, to take a simple example, violins provide upper harmonics
to a louder, lower fundamental and at a given moment they cease to fuse, begin
to vibrate, begin to move with independent intervals and then again return to
their previous state. The images of union and individuation are powerful ones
which have both psychological and mystical implications. ‘The Many and the
One.’” (Harvey 1999)

It is notable that all three composers make a particular point of spectralism’s place

in history. Composers of the postwar generation seemed particularly keen to define

spectralism as a coherent alternative to what they perceived as the cognitively opaque

results of serialism. Spectral composition nevertheless gravitates toward a formalist

approach that privileges individual compositional control. Technology has tended to


5. Compositional Applications 80

function as a means of gaining precision and power. It remains an interesting open area

for exploration to see how spectral approaches might be applied in improvisational or

indeterminate compositional situations.

From a technical point of view, spectral music engages a number of specific concerns.

Pitch is organized in a continuous fashion and pitch structures are often reckoned directly

in cycles per second. In instrumental writing, approximations to the nearest quarter-

tone (or in some cases eighth tone) are used. This is in marked contrast to a precise

just intonation pitch specification.1 The harmonic series serves as an important point of

departure, and often functions as a stable pole (if not the stable pole) in the harmonic

discourse. Harmony often proceeds by a differential approach which unfolds processually.

The notion of “instrumental additive synthesis” often features prominently in spectral

composition. A combination of instrumental sounds with the appropriate frequencies and

intensities can fuse into a single timbral entity. In contrast to some approaches to timbral

composition, instrumental color tends to be exploited not for its differentiating abilities

but for its potential to participate in spectral fusion. Thus individual instrumental timbres

are subsumed into larger musical structures.2 In some spectral works, harmonic models

are derived from characteristic spectra of instrumental sounds. The goal of instrumental

additive synthesis is never to recreate, but rather to reveal latent musical potential is

pre-exisiting sonic material. The archetypal gesture of instrumental additive synthesis is

the arpeggiated spectra which gradually transforms from a series of individual tones to a

fused complex.

As with pitch, the approach to rhythm derives from a temporal sensibility that is

continuous rather than discrete. Rhythmic processes often involve controlled long range

accelerations and decelerations. Additive and subtractive rhythms or rhythmic interpola-

tions (as in the music of Philippe Hurel) are another common strategy. Overlaid rhythmic
1 The approximation approach tends to be found in the music of the French spectralists and their followers.

American music that might be identified as spectral or proto-spectral has tended to follow a just intonation
approach (as in the compositions of James Tenney or Ben Johnston).
2 This technique is a hallmark of spectral music. While instrumental timbre played an important role in

some proto-spectral music (that of Varèse in particular) timbre tended to be more about making elements
more distinct (Varèse’s planes of sound) rather than fusing them together.
5. Compositional Applications 81

pulsations with various rational relationships are also encountered (particularly in the

music of Grisey). In many cases rhythms are approximated, either via quantization or

proportional notation.

A complete summary of all technical aspects found in spectral music is beyond the

scope of this document (see Fineberg (2000) for more details). For our discussion, the

following list of some of the central concerns of spectral music will suffice:

• continuous frequency space

• continuous temporal space

• timbre-harmony continuum

• the use of approximations

• global approach to form

• smooth transitions contrasted with ruptures

• process preferred over development

• appeal to perceptual and psychoacoustic aspects of sound

The last item points to a particularly anti-formalist aspect of spectral music. For all

its technical machinations, spectral music is very much “for the listener.” However, this

tendency to appeal to cognitive “universals” raises interesting concerns. For example,

to what degree might the expressive or communicative ability of the spectral approach

be limited by this focus? In particular, does the pre-occupation with the purely acoustic

preclude the possibility of addressing culture-specific aspects of the musical enterprise?

These open questions point to interesting possible directions for future compositional

work.

Technology has always played an important role in the development of spectral music.

Models derived from electronic processes such as frequency shifting, ring modulation,

frequency modulation, or tape delay can be found in any number of spectral pieces.

The desire for mastery of the sonic continuum led naturally to the use of the digital

computer.3 Digital sound synthesis is complementary to instrumental synthesis and


3 Ironically, one gains reproducible control of the continuum in the discrete world of the computer.
5. Compositional Applications 82

spectral works that combine electronic and instrumental sounds can achieve remarkable

coherence. Computer-assisted composition environments, which will be discussed in

section 5.3, are indispensable for realizing sonic processes. SPEAR is a particularly useful

tool for spectral composition. The high performance GUI makes it possible to visualize

and manipulate complex sounds. Its detailed analyses can be used in the derivation of

harmonic and/or rhythmic models, or the data can be harnessed directly to drive various

forms of digital synthesis.

5.2 Sonic Transformations

A sinusoidal analysis may be viewed as a reservoir of sonic data with almost limitless

musical potential. Possible temporal transformations include replication, time delay, time

dilatation, and time reversal. Amplitude envelopes may be applied (for example, to fade

partials in and out, or to emphasize the attack portion of a sound). Partial frequencies may

be altered by any desired transposition or frequency shifting envelope. The analysis data

can be used to drive other compositional or synthesis processes (for example, granular

synthesis). The following subsections detail a number of possible sonic transformations.

5.2.1 Timbre Hybridization

Properties of different spectral analyses can be combined to create new hybrids. This

process of timbre hybridization is sometimes referred to as cross synthesis. There are many

possible cross synthesis implementations, some of which are found in software such as

AudioSculpt, SoundHack, Shapee, and in commercial plugins such as Prosoniq Morph.

Many of these techniques focus on different ways of combining STFT frames or applying

properties of one STFT frame to another. Sinusoidal modeling offers the possibility of very

fine-grained control over the cross synthesis process.

Dynamic morphing between two collections of partials A and B can be achieved by

matching common partials and interpolating their frequencies and amplitudes. Such a

procedure has been implemented as in Common LISP as follows:


5. Compositional Applications 83

1. Temporally align the two sounds by means of timebase envelopes that map output

times to time points in the original sound.

2. Define an interpolation envelope that maps output time to relative fraction of A or

B that should be present in the output — a value of 0 indicates the sound should

consist of all A, 1 indicates all B, intermediate values should be a blend of A and B.

3. Compute a succession of blended spectral frames to be synthesized. This proceeds

as follows:

• Advance an output time variable t by a predetermined hop.

• For each value of t, determine the active partials in A and B and compute two

frames At and Bt with the interpolated frequencies and amplitudes of the active

partials at time t.

• For each value of t, determine the current blend factor x from the interpolation

envelope.

• Determine a set of matching frequency/amplitude points in At and Bt . Matching

is controlled by a frequency tolerance. Interpolate the amplitude and frequency

of each of the matching points according to the blend factor x. Unmatched

points can be discarded or have their amplitude adjusted according to x (thereby

creating a cross fade of the unmatched partials).

4. Synthesize the blended frame and window with a tapered window function. Overlap

add into the output.

This morphing algorithm is somewhat crude, but is reasonably effective for certain kinds

of sounds. It could be improved by computing a new set of blended partials, rather than

overlap adding phase-incoherent frames in a granular fashion. As it is, the algorithm only

works well if sounds contain similar frequency content. Harmonic sounds need to share

the same fundamental frequency. A more sophisticated algorithm, such as that used in

Loris, would tag the harmonics of A and B relative to a fundamental frequency track.

Like harmonics can then be interpolated and smoothly glide to new frequencies. For the

specifics of the Loris morphing algorithms see Fitz et al. (2003).


5. Compositional Applications 84

A more sophisticated algorithm should also be able to interpolate between more than

two sources. It would be musically useful to create long sequences of seamless morphs

between a series of sounds as with IRCAM’s Diphone Studio. Despite its limitations, this

basic algorithm shows promise and has been used to create some effective morphs between

harmonic and inharmonic timbres.

Another approach to cross synthesis is to keep the frequency content fixed and ma-

nipulate amplitudes to affect a change in the spectral envelope (see section 2.3 for a

brief discussion of spectral envelopes). A hybrid can be created by imposing the spectral

envelope of one sound on another. Although SPEAR does not currently include spectral

envelope capabilities, a number of Common LISP functions have been developed for

manipulating spectral envelopes and applying them to SPEAR analyses (see table 5.3). The

functions are designed to work with spectral envelope data generated by SuperVP, a pow-

erful command line analysis/synthesis tool developed at IRCAM. Successful experiments

have been conducted with the application of vocal spectral envelopes to tam-tam sounds

(and visa-versa).

5.2.2 Spectral Tuning

Spectral tuning adjusts the frequency content of a sound to match the frequencies of a

predetermined spectrum or harmonic field. The target frequencies might be derived from

any number of sources: octave repeating scales or chords, synthetic spectra, frequencies

derived from other spectral analyses, distortion and warping of existing spectra, etc.

Spectral tuning using the phase vocoder has been explored extensively by a number of

composers and researchers including Eric Lyon and Christopher Penrose with FFTease

(Lyon 2004), Paul Koonce with PVC, and Trevor Wishart with Composers Desktop Project

(CDP) (Wishart 2000).

Spectral tuning with a sinusoidal model allows very precise control of the evolution of

each partial. In a typical application, the tuning is static: the local amplitude and pitch

fluctuations of each partial are maintained and the partial is transposed so that its average
5. Compositional Applications 85

frequency matches the target. Local pitch fluctuations can be attenuated or emphasized

as desired. Dynamic tuning transposes a partial according to a variable function. If the

rate of frequency change is not too rapid, the retuning can still maintain local frequency

perturbations which are crucial to the character of most natural sounds. A simple example

of dynamic tuning would be the application of vibrato or glissando.

Parameters can control the strength of retuning. For example if a partial is separated by

a distance of 3 semitones from its target, a retuning strength of 70% would transpose the

target by 2.153 semitones rather than the full 3 semitones. Or a parameter can specify that

partials are retuned only if they are in close enough proximity to the target. Particularly

interesting results have been achieved with progressively retuning inharmonic timbres so

that a succession of sounds eventually converges on a harmonic spectrum. Spectral tuning

can also be used for cross synthesis where the target frequencies are derived from analysis

of a different sound.

5.3 Software Composition Environments

Although a great deal of useful sonic manipulation and transformation can be accom-

plished directly with the SPEAR GUI, much of the software’s compositional utility is

revealed when it is used in conjunction with other tools. Computer-assisted composition

and performance environments such as OpenMusic, Common Music, Max/MSP, RTCmix,

and SuperCollider (to name a few), can be used to import, manipulate, export, and/or

synthesize SPEAR analysis data.

SPEAR’s support for text based file formats (as well as standard SDIF formats) makes

cross program data exchange relatively straightforward. The text formats are particularly

useful because they are human readable and easy to use with interactive interpreted

programming languages such as Java, Python, Scheme, and Common LISP. Import and

export of the SDIF formats is more difficult and time consuming to implement for casual

programmers. The author’s own experience has shown that even though SDIF is robust and

well-defined, it is not necessarily widely understood among computer music practitioners.


5. Compositional Applications 86

Even with supporting SDIF libraries such as those provided by CNMAT or IRCAM, it may

prove more expedient (particularly in one-off applications) to use the text data formats.

5.3.1 OpenMusic

OpenMusic is a computer-assisted composition environment developed at IRCAM using

the Common LISP language. It is a visual programming environment based on patching of

pre-defined or user-defined objects. The patch paradigm provides full access to Common

LISP programming, including iteration and recursion. Nodes in the object graph are

evaluated for their results and possible side-effects. A user request for evaluation will

trigger evaluation of all parent nodes (figure 5.1). OpenMusic includes graphical editors for

MIDI data, breakpoint functions, and common practice notation (figure 5.2). OpenMusic

was developed as a successor to PatchWork a similar LISP based environment for visual

programming.

Figure 5.1: OpenMusic patch that evaluates the arithmetic expression 10 + (2 × 10) .

The environment may be extended with user defined libraries. New OpenMusic func-

tion are defined as CLOS (Common LISP Object System) methods using the om:defmethod!

macro. SPEAR import and export has been integrated with OMTristan, a custom OM

library created by Tristan Murail. OMTristan, which has evolved over decades, includes a

wide variety of compositional algorithms for the generation of chords and spectra, tempo-

ral computations, MIDI communication, and general data manipulation. The spdata object

in OMTristan is specifically designed for efficient storage and manipulation of frame based

spectral data. The methods spear-read and spear-write convert between the SPEAR

par-text-frame-format file type and OM spdata objects.


5. Compositional Applications 87

Figure 5.2: OpenMusic notation editor displaying a chord sequence.

Figure 5.3 shows an OM patch that reads SPEAR data, transforms the data, displays

the transformed data in common practice notation, and writes out the transformed data

to a new text file. The patch functions as follows: The spear-read object (located at the

top) takes the name of a text file as input and outputs an spdata object which is stored

as an embedded object (this avoids having to re-read the data from disk on subsequent

evaluations). The spdata output is split into three streams: the frequencies, the amplitudes,

and the partial index numbers. Each stream is a LISP list of two elements. The first element

is a list of times for each frame and the second element is a list of lists where each sublist is

the frame data (either frequencies, amplitudes, or indices). The time base is stretched (the

“dilation” parameter) and the frequencies distorted in the dist-frq sub-patch. The data is

reassembled into frames and converted to a chord sequence in the visu-chseq sub-patch.

The times, indices, frequencies, and amplitudes are also routed to the spear-write object

which outputs a new text file with the transformed data. The data can then be opened in

SPEAR for resynthesis or further transformation.

5.3.2 Max/MSP

Max/MSP is a widely used graphical patching environment for realtime media processing.

SPEAR SDIF data can be imported into Max/MSP using the CNMAT SDIF objects (Wright

et al. 1999b). The SDIF-buffer stores the SDIF data in a named buffer which can be

accessed using the SDIF-tuples object. SDIF-tuples outputs matrix data at the specified
5. Compositional Applications 88

Figure 5.3: OpenMusic patch that reads and writes SPEAR data.
5. Compositional Applications 89

time (with optional interpolation) as a Max list. CNMAT’s sinusoids~ object implements

an oscillator bank (with optional bandwidth enhancement as described in section 3.6.1)

that is driven by a list of frequency amplitude pairs. The threefates object can be used to

manage the sinusoidal track indices output from the SDIF buffer and format the data into

frequency amplitude pairs suitable for the sinusoids~ object.

0., 10. 10000 7.1, 2.1 5000 loadbang loadbang

read-stream-number moanin.add.sdif 0 read-stream-number kyorei.add.sdif 0


line 0. 10.
SDIF-buffer ss-toot sdif-buffer ss-boot

2.1
shakuhachi sax
tff

tuples time $1 columns 1 2 3 tuples time $1 columns 1 2 3 0.

SDIF-tuples ss-toot SDIF-tuples ss-boot

threefates threefates

zl slice 100 zl slice 100 0.

tbl prepend sinusoids frequency-scale $1

f 1. sin-transform

list-interpolate

sinusoids~
List-interpolate with sines
While this fast and dirty interpolation yields some
results, it would be more musical if we paid
attention to the actual meaning of the SDIF data.
Trying to interpolate between timbre, contour, and
amplitude envelope all it once is a crazy idea, if
you think about it.
•!start audio
startwindow stop

dac~

Figure 5.4: Max/MSP patch from the CNMAT Spectral Synthesis Tutorials that interpolates
between two sinusoidal models to achieve a dynamically controllable cross synthesis.

The limitation of 256 elements with the standard Max list objects does put some

constraints on the complexity of the spectral data. Michael Zbyszynski’s “Spectral Synthesis

Tutorials” suggest some solutions (Zbyszynski et al. 2007). Possibilities include the use of

custom C or Java objects, the use of Javascript, the use of Jitter matrix processing objects,

or the use of IRCAM’s FTM extensions.


5. Compositional Applications 90

The use of sinusoidal models in the Max/MSP/Jitter environment offers numerous

options for interactive transformation and resynthesis. The ability to manipulate spectral

data directly in Max using Java and Javascript eases the burden of managing large spectral

data sets. Future work could lead to the development of “partial based” storage models

rather than the frame based approach implicit in the SDIF 1TRC format. As we shall

see in the next section, the partials based format (collections of breakpoint functions) is

particularly well suited for certain types of musical operations.

5.3.3 Common Music

Common Music (CM) is an algorithmic composition environment developed by Heinrich

Taube. Like OpenMusic, CM is implemented in Common LISP. It is built around the notion

of generalized input/output streams which can include MIDI files, MIDI ports, Open

Sound Control streams, CSound scores, CLM notelists, and CMN scores. Compositional

facilities include the scheduled processes, a full featured pattern iteration system, and a

flexible graphical interface called Plotter. Plotter facilitates display and editing of breakpoint

functions, histograms, scatter plots, and piano roll style MIDI data (Taube 2005). CM can

be tightly integrated with CLM (Common Lisp Music), a Music-N style acoustic compiler.

Current versions of CM include a spectral composition toolkit than can read SPEAR

data from par-text-frame-format and par-text-partials-format files. The data is stored

as nested LISP lists. For manipulating large sounds, it can be more efficient to store spectral

data in LISP vectors (contiguous arrays). A LISP toolkit has been developed for reading,

writing, and manipulating par-text-partials-format data. A complete sound is stored as

a LISP list of sp-partial data structures. Each sp-partial represents a single breakpoint

envelope with time, frequency, and amplitude arrays. Functions exist to transpose, time

warp, and amplitude scale partials. The adjustments can be controlled dynamically with

envelopes. The function for transforming single partials are summarized in table 5.1, page

91. Many of the transformation functions take an optional copy parameter which, when
5. Compositional Applications 91

(copy-partial partial)

Return a newly allocated copy of partial.

(offset-partial partial offset [copy])

Shift the time of partial by offset seconds.

(transpose-partial partial interval-ratio [copy])

Transpose the frequencies of partial by interval-ratio.

(transpose-partial-env partial interval-ratio strength-env [copy])

Transpose the frequencies of partial by a ratio ranging from 1 to interval-ratio. strength-env


is an envelope controlling the percentage of interval-ratio applied.

(time-scale-partial partial timescale [copy])

Apply a fixed time scaling ratio timescale to partial.

(amplitude-scale-partial partial ampscale [copy])

Apply a fixed amplitude scaling ratio ampscale to partial.

(reverse-partial partial [copy])

Reverse the temporal evolution of partial.

(amp-env-partial partial env [copy])

Apply an amplitude scaling envelope env to partial.

(sp-partial-avg-freq partial)

Compute the average frequency of partial.

(sp-partial-avg-amp partial [points])

Compute the average amplitude of partial. If points is non-nil, the average is computed based on
the first points breakpoints. If points is nil (the default) all breakpoints are used to compute the
average.

(sp-partial-interp partial time)

Interpolates the frequency and amplitude of partial at time seconds. The frequency and amplitude
are returned as multiple values (in that order). Uses a binary search for efficiency.

Table 5.1: Transformation functions applied to individual partials.

non-nil, applies the transformation to a new copy of partial, leaving the original unchanged.

The default value of copy is nil, meaning the transformation is applied destructively.

A retune-partials function can transpose a collection of partials to a new set of

target frequencies. The retune-partials function assumes partials with relatively constant

frequency (the retuning is based on the average frequency of the original partial). Functions
5. Compositional Applications 92

(retune-partials partials target {keyword value}*)

Transpose each of the partials of partials so that its average frequency matches a frequency
contained in the list target. Because the target frequency set may cover a vastly different frequency
range from the original partials, the average partial frequencies are first rescaled to lie within the
minimum and maximum frequencies of target. Each rescaled frequency is then matched to the
closest frequency in target. The matching is controlled by the following keyword arguments:
:low-limit Only tune partials above this frequncy. Default is 0 Hz.
:high-limit Only tune partials below this frequcny. Default is 3000 Hz.
:delay-curve Envelope to apply a frequency dependent delay to each partial. De-
fault is nil.
:amp-curve Envelope to apply frequency dependent amplitude scaling to each
partial. Default is nil.
:flatten Scale the ambitus of each partial. Values < 1.0 reduce the amount of
frequency variation of each partial. When 0.0, all frequencies are set
to the average frequency.
:strength Adjust the amount of frequency remapping. When 1.0, the partials
are retuned to exactly match the target frequencies. When 0.0, the
partials remain at their original frequencies.
:strength-curve Adjust the amount of frequency remapping over time. Partials can be
made to glide between original and target frequencies.
:index-scramble Scramble the targets such that the original partials will remap to
less proximate target frequencies. When 0, there is no scrambling
(the default). When 1, remap to the neighbor of the closest target
frequency. When n, remap to the nth neighbor (either above or
below).

(sp-partials-apply-vibrato partials {keyword value}*)

Apply slow frequency modulation (vibrato) to partials (a list of partials). This function requires
CLM. The following keyword arguments are supported:
:vibfreq The frequency of the vibrato in hertz. Default is 6.5.
:vibinterval The width of the vibrato in semitones. Default is 0.5.
:start-time Starting time for the onset of vibrato (in seconds). Default is 0.
:end-time Ending time for the vibrato (in seconds) or nil if vibrato continues to
the end. Default is nil.
:copy If non-nil, return new copies of the partials leaving the originals
unchanged. Default is true.

Table 5.2: Frequency adjustment functions applied to collections of partials.

for frequency transformation are summarized in table 5.2, page 92. Spectral envelope

functions are show in table 5.3, page 93.

5.4 Compositional Examples

A straightforward application of SPEAR is the derivation of harmonic models for instru-

mental composition. Pitches can be transcribed manually from the GUI or by importing
5. Compositional Applications 93

(sp-partials-flatten-spectral-env partials spectral-env [amp-scaler])

Flatten the spectral envelope of the sound represented by partials by dividing the partial ampli-
tudes by the spectral envelope spectral-env. The spectral envelope can be generated by reading
an SVP text formatted spectral envelope analysis. Apply an optional amplitude scaler amp-scaler
(default value 0.025) following the division.

(sp-partials-apply-spectral-env partials old-spectral-env new-spectral-env {keyword


value}*)

Apply a new spectral envelope new-spectral-env to the sound represented by partials.


old-spectral-envelope is the current spectral envelope of the sound. The following keyword
arguments are supported:
:transpose Transposition ratio.
:time-warp Time remapping envelope.
:freq-warp Frequency remapping envelope.
:freq-func Generalized frequency transformation. freq-func is a function of
three arguments: frequency, partial-index, and time. The function
should return the new desired frequency value.
:filter-func Generalized amplitude transformation. filter-func is a function
of four arguments: amplitude, frequency, partial-index, and time.
The function should return the new desired amplitude value.
:copy If non-nil, return new copies of the partials leaving the originals
unchanged. Default is true.

Table 5.3: Spectral envelope functions applied to collections of partials.

into other environments. Rather than relying on individual spectral snapshots, the evolu-

tion of a collection of partials can be incorporated into the composition model.

In the author’s work Subterrain (2008) for B[ clarinet, string trio, and electronics,

clarinet multiphonics were analyzed. The SPEAR GUI was indispensable for extracting

only the most stable and/or musically interesting partials from these complex sounds.

The data was imported into Common Music and amplitudes were averaged across entire

partials, which gave a very good sense of the overall spectral color of the multiphonic.

Further manipulation of the multiphonic spectra, including frequency shifting and ring

modulation, were used to create harmonies and pitch reservoirs.

Spectral retuning using SPEAR and OpenMusic was used to create a number of sounds

in Tristan Murail’s Pour adoucir le cours du temps (2005) for 18 instruments and synthesized

sound. In this piece, the partials of tam-tam sounds were retuned to the frequencies of

artificially generated spectra. A cow-bell analysis also undergoes similar retuning. Of the

work, Murail writes,


5. Compositional Applications 94

“The ‘noises’ (breath, grainy sounds, metallic resonances) of the piece have
been domesticated, tuned to instrumental harmonies through the use of a
specific technique for altering the internal components of the sounds. This is
how the sound of a gong can become a harmony, or the virtual cow-bells can
be constantly varied and altered by their musical context. The synthesized
sounds are triggered from an onstage MIDI keyboard — they are mixed and
spatialized in real-time by the program Holophon developed by the GMEM.”
(Murail 2005)

Spectral tuning was used to generate several of the sounds used in the author’s work

Tear of the Clouds (2007) (see appendix A for the complete score). The low vocal sounds

(at m. 14, reh. A and m. 203) were created by retuning and resynthesizing samples of a

choral ensemble. Although these particular source samples exhibited a high standard of

performance and excellent recording quality, they were not ideally suited for resynthesis.

Because they consisted of a sum of many ensemble voices (all in unison pitch), and because

they were recorded in a reverberant environment, partial tracking resulted in a large

number of short low amplitude partials in addition to the fundamental and harmonics. For

retuning purposes, a cleaner representation was desired. IRCAM’s Additive program was

used to perform fundamental frequency tracking and to analyze a fixed set of harmonic

partials (up to 22 kHz). The SDIF 1TRC file from Additive was imported into SPEAR and

then exported as a text file. This was then imported into LISP where the retuning was

performed.

At m. 14, the retuning target consists of selected partials of a compressed harmonic

spectrum built on G1 (49 Hz). The tuning was only applied to the original partials ranging

from 49 Hz to 1850 Hz. Because the original sound had a fundamental of 82 Hz, retuning

and compressing the entire spectrum would have resulted in a considerable loss of high

frequency presence. Since most of the harmonic “color” of a sound is found below 2000

Hz, the harmonics above 1850 Hz were retained in their original tuning (another option

would be to replicate partials to fill in the missing upper harmonics). The retuned data was

then imported into SPEAR and synthesized. Further post processing, including granular

time stretching, was then applied. The granulation produced a sound of the required
5. Compositional Applications 95

duration and helped re-impart some of the reverberant quality of the original sound that

was lost in the analysis.

Further applications of spectral tuning in Tear of the Clouds include the retuning of

tubular bell, piano, and tam-tam sounds. The penultimate electronic sound of the piece

is built from a retuning of a piano sound. The target spectrum consists of a distorted

harmonic series built on D]1 (39 Hz), in which the even partials are stretched and the

odd partials are unchanged. For this particular sound, the retuning is dynamic. The

attack maintains the original tuning and the partials glide to the target spectrum over a

period of 2.75 seconds. Rather than sliding to the nearest target frequency, the retuning

was configured so that the partials make wider frequency excursions (either up or down),

crossing each other as they proceed.

Figure 5.5: Dynamic retuning with crossing partials.

5.5 Performance Models

Although not extensively explored in the current work, some words are in order about the

potential applications in performance and improvisation. The SPEAR interface itself could

be viewed as a crude performance interface where one can scrub through the sound using

the mouse (Garton 2007). One can easily imagine a richer instrument that would give
5. Compositional Applications 96

control of elements such as playback speed, transposition, frequency shifting, harmonic

parity (balance of even and odd partials), etc. These potentials would ideally be realized

in a flexible environment such as Max/MSP, rather than in the more confining space of

the SPEAR user interface. A variety of gestural and tactile inputs could be used to drive

resynthesis parameters.
97

6. Conclusions

The preceding chapters have given a thorough account of the technical aspects of the

design and development of SPEAR while also touching on some possible compositional

applications. I do not wish to claim that SPEAR represents a revolutionary new computer

music tool. It has been preceded by more than fifteen years of research and development

in the field of spectral analysis and modeling. However, I hope it is clear that SPEAR is an

important evolutionary convergence of a number of mature technologies.

Although SPEAR has experienced reasonable popularity among a wide variety of

technologically oriented music practitioners, the software is still very much a work-in-

progress. Although the interface works well, it is still limited in many ways. A multi-

demensional display should be developed that allows one to view and edit all aspects

of a partial’s evolution: amplitude, noise bandwidth, phase, stereo (or multi-channel)

panning, etc. The user should be able to draw breakpoint envelopes and curves to control

the transformation of any parameter. Rule based selection (see section 4.2.1) should be

implemented. Moreover, an interface that allows one to specify persistent selections (like

audio regions in a waveform editor) would be incredibly useful. A user interface for

sound morphing and hybridization would also be welcome. Since it is impossible to

envision all of the possible desired editing and transformation possibilities, SPEAR needs

a built-in scripting language and command-line tools that allow users to develop their

own processes.

SPEAR must also keep pace with developments in spectral analysis and synthesis. To

build more compact and malleable spectral models, SPEAR needs robust fundamental

frequency tracking. The option should be available to perform analyses with fixed numbers
6. Conclusions 98

of harmonics (as with IRCAM’s Additive and Diphone programs). Models with fixed

numbers of partials are much easier to manipulate in real-time environments such as

Max/MSP or SuperCollider.

Some form of noise modeling, as discussed in section 3.6, should be implemented. The

transient sharpening method described in section 3.5 is successful to a point, but to handle

wide-band polyphonic audio and percussive attacks, a separate transient detection and

representation method is needed. The multi-resolution sinusoidal modeling techniques

described in Levine (1998) should also be implemented. Although multi-resolution model-

ing significantly increases the implementation complexity of partial tracking (particularly

at the band boundaries), it has the potential to considerably reduce pre- and post-echo

artifacts. The combination of time-reassignment, multi-resolution analysis, noise modeling,

and improved partial tracking would represent a formidable analysis engine.

In order to tie together these more sophisticated analysis techniques, a more advanced

user interface is required. Each component of the analysis would be assigned to its own

layer:

• fundamental frequency layer

• sinusoids layer

• noise layer

• transient layer

• spectral envelope layer

Each of these layers could be (optionally) coupled to the others. For example, when

layers are coupled together, adjustments to the fundamental frequency track would

correspondingly adjust the frequencies in the sinusoids layer. As the partials are transposed,

the amplitudes could be adjusted to match the spectral envelope layer. Peak detection and

tracking could be applied to the spectral envelope data so that formant trajectories and

widths could be adjusted in a manner analogous to individual partials. Given sufficient

time and effort, all of these ideas are feasible.


6. Conclusions 99

Returning to some of the issues raised in the opening chapter, we must consider what

a tool like SPEAR tells us about timbre and musical organization. For all its abilities,

SPEAR perhaps raises more questions than it answers. Although obvious, it’s worth

stating the overall conclusion: additive synthesis is not a sufficient model for timbre.

Although additive models are malleable, they tend to be fragile — minor adjustments to a

few partials can destroy timbral fusion and coherence. Higher level models are needed.

Spectral envelopes are a good first step. We should also look for models of the micro-level

evolution of partials. The amplitude and frequency fluctuations can be correlated between

partials and with parameters of the overall evolution of the sound. Finally, we must

recognize that the time varying aspects of spectra are often far more important than

the instantaneous relationships. We need timbral models than can capture micro-level

rhythm, grainy textures, and impulses. The TAPESTREA software, which uses a sinusoids

+ stochastic background + transients model, suggests a particularly promising approach to

high level timbre modeling (Misra et al. 2006).

From a compositional perspective there is clearly more to learn about timbre and

tools such as SPEAR can help with these compositional approaches. But it is not enough

for compositional work to merely reflect or reinforce habitual modes of understanding

timbre. Experimental composition can reveal new ways to listen to timbre, previously

unheard subtleties, surprising relationships, and unanticipated modes of organization.

With the acknowledgment that speculations on the future are often more about setting

an agenda than trying to make a neutral prediction, we might ask what role computer

music might play in this process and in turn what will be the nature of tools to come? Or

more precisely, what do we want from our musical tools of the future? Successful software

tends to embody at least some of the following attributes:

1. ease of use and configuration

2. high performance

3. large feature set

4. reliability
6. Conclusions 100

5. backward compatibility

6. initial novelty

7. openness and extensibility

At present, commercial software tends to excel at items 1–3, while failing miserably in

terms of openness and user extensibility. Extensibility has typically been the strength

of software with an experimental and open source pedigree. Commercial software is

frustrating because it tends to be expensive and creatively limiting. Free software is

frustrating because it tends to be difficult to configure and maintain. Of what benefit is

extensibility if one is left with no time to exploit it? Future software efforts must bridge

this gap. We must avoid the “sealed black boxes” of the commercial world and the arcane

configuration, instability, and continual tabula rasa upheavals of the open source world.

Negative proscriptions aside, what potentials are there for music software of the

future? An environment that melds a powerful acoustic compiler and general purpose

programming language with a flexible and high-performance GUI seems ideal. Such an

environment should include flexible time-based displays that can handle common practice

notation, breakpoint functions, audio data, and spectral models. It must not be a neutral

environment — it must make assumptions. It must assume a user who wishes to work

with common practice diatonic music and quantized meter and rhythm; it must assume

a user who wishes to work with millions of sonic granules on a proportional time scale;

it must assume a user who wishes to precisely notate irrational rhythms and microtonal

inflections; it must assume a user who wishes to capture and manipulate performance data

in real time. Technologically, this is all within the realm of possibility today. It remains to

be seen when, if, and/or how this may come about. Tools of this sort are not envisioned to

make composition facile. They should exist to engage and inspire the human imagination

and creative spirit.


101

References

Abe, Mototsugu, and Julius O. Smith. “Design Criteria for Simple Sinusoidal Parameter
Estimation based on Quadratic Interpolation of FFT Magnitude Peaks.” In Proceedings of
the 117th Audio Engineering Society Convention. AES, 2004.

Amatriain, Xavier, Maarten de Boer, Enrique Robledo, and David Garcia. “CLAM: An
OO Framework for Developing Audio and Music Applications.” In Proceedings of
17th Annual ACM Conference on Object-Oriented Programming, Systems, Languages and
Applications. Seattle, WA, 2002. http://mtg.upf.edu/publicacions.php.

Auger, F., and P. Flandrin. “Improving the Readability of Time-Frequency and Time-Scale
Representations by the Reassignment Method.” IEEE Transactions on Signal Processing 43:
(1995) 1068–1089.

Beauchamp, James W. “Unix Workstation Software for Analysis, Graphics, Modification,


and Synthesis of Musical Sounds.” In Proceedings of the 94th Audio Engineering Society
Convention. AES, 1993. Preprint no. 3479.

Bonada, Jordi, and Xavier Serra. “Synthesis of the Singing Voice by Performance Sampling
and Spectral Models.” IEEE Signal Processing Magazine 24, 2: (2007) 67–79.

Bresson, Jean, and Carlos Agon. “SDIF Sound Description Data Representation and
Manipulation in Computer Assisted Composition.” In Proceedings of the International
Computer Music Conference. Miami, FL: ICMC, 2004, 520–527.

Cadoz, Claude, Annie Luciani, and Jean Loup Florens. “CORDIS-ANIMA: A Modeling and
Simulation System for Sound and Image Synthesis: The General Formalism.” Computer
Music Journal 17, 1: (1993) 19–29.

Cormen, Thomas H., Charles E. Leiserson, and Ronald L. Rivest. Introduction to Algorithms.
Cambridge, MA: MIT Press, 1990.

Dashow, James. “The Dyad System (Parts Two and Three).” Perspectives of New Music 37,
2: (1999) 189–230. http://www.jamesdashow.net/download/DyadSystemPt2-3.pdf.

Depalle, Philippe, Guillermo García, and Xavier Rodet. “Tracking of Partials for Additive
Sound Synthesis Using Hidden Markov Models.” In IEEE International Conference on
Acoustics, Speech and Signal Processing. Minneapolis, MN, 1993, 225–228.
References 102

Fineberg, Joshua. “Guide to the basic concepts and techniques of spectral music.” Contem-
porary Music Review 19, 2: (2000) 81–113.

Fitz, Kelly. The Reassigned Bandwidth-Enhanced Model of Additive Synthesis. Ph.D. thesis,
University of Illinois Urbana-Champaign, Urbana, IL, 1999.

Fitz, Kelly, Lippold Haken, and Brian Holloway. “Lemur—A Tool for Timbre Manipulation.”
In Proceedings of the International Computer Music Conference. Banff, Canada, 1995, 158–161.

Fitz, Kelly, Lippold Haken, Susanne Lefvert, Corbin Champion, and Mike O’Donnell.
“Cell-utes and Flutter-Tongued Cats: Sound Morphing Using Loris and the Reassigned
Bandwidth-Enhanced Model.” The Computer Music Journal 27, 4: (2003) 44–65.

Fitz, Kelly, and Lippopld Haken. “On the Use of Time-Frequency Reassignment in Additive
Sound Modeling.” Journal of the Audio Engineering Society 50, 11: (2002) 879–892.

Garton, Brad. “Multi-language Max/MSP.”, 2007, Accessed August 27, 2008. http:
//www.cycling74.com/story/2007/3/26/145938/572.

Grey, John M. “Multidimensional perceptual scaling of musical timbres.” Journal of the


Acoustical Society of America 61, 5: (1977) 1270–1277.

Grisey, Gérard. “Did You Say Spectral?” Contemporary Music Review 19, 3: (2000) 1–39.

Harris, Fredric J. “On the Use of Windows for Harmonic Analysis with the Discrete Fourier
Transform.” Proceedings of the IEEE 66, 1: (1978) 51–83.

Harvey, Jonathan. In Quest of Spirit: Thoughts on Music. Berkeley, CA: University of


California Press, 1999.

Lagrange, Mathieu, Sylvain Marchand, Martin Raspaud, and Jean-Bernard Rault. “En-
hanced Partial Tracking Using Linear Prediction.” In Proceedings of the 6th International
Conference on Digital Audio Effects. London, UK: DAFx-03, 2003, 402–405.

Lagrange, Mathieu, Sylvain Marchand, and Jean-Bernard Rault. “Partial Tracking based
on Future Trajectories Exploration.” In Proceedings of the 116th Audio Engineering Society
Convention. AES, 2004.

. “Enhancing the Tracking of Partials for the Modeling of Polyphonic Sounds.” IEEE
Transactions on Audio, Speech, and Signal Processing 15, 5: (2007) 1625–1634.

Laroche, Jean. “Synthesis of Sinusoids via Non-Overlapping Inverse Fourier Transform.”


IEEE Transactions on Speech and Audio Processing 8, 4: (2000) 471–477.
References 103

Laroche, Jean, and Mark Dolson. “Phase-vocoder: About this phasiness business.” In
IEEE ASSP Workshop on Applications of Signal Processing to Audio and Acoustics. IEEE, 1997,
19–22.

. “New Phase-Vocoder Techniques for Pitch-Shifting, Harmonizing and Other Exotic


Effects.” In IEEE ASSP Workshop on Applications of Signal Processing to Audio and Acoustics.
IEEE, 1999, 91–94.

Lerdahl, Fred. “Timbral Hierarchies.” Contemporary Music Review 1, 2: (1987) 135–160.

Levine, Scott N. Audio Representations for Data Compression and Compressed Domain Processing.
Ph.D. thesis, Stanford University, Stanford, CA, 1998.

Lindemann, Eric. “Music Synthesis with Reconstructive Phrase Modeling.” IEEE Signal
Processing Magazine 24, 2: (2007) 80–91.

Loy, Gareth. “Composing with Computers—a Survery of Some Compositional Formalisms


and Music Programming Languages.” In Current Directions in Computer Music Research,
edited by Max V. Matthews, and John R. Pierce, Cambridge, MA: The MIT Press, 1989,
chapter 21, 291–396.

Lyon, Eric. “Spectral Tuning.” In Proceedings of the International Computer Music Conference.
Miami, FL: ICMC, 2004, 375–377.

Maher, Robert C. An Approach for the Separation of Voices in Composite Musical Signals. Ph.D.
thesis, University of Illinois, Urbana, IL, 1989.

Matthews, Max V. “The Digital Computer as a Musical Instrument.” Science 142, 3592:
(1963) 553–557.

McAulay, Robert J., and Thomas F. Quatieri. “Speech Analysis/Synthesis Based on A


Sinusoidal Representation.” IEEE Transactions on Acoustics, Speech, and Signal Processing
34, 4: (1986) 744–754.

Misra, Ananya, Perry R. Cook, and Ge Wang. “Musical Tapestry: Re-composing Natural
Sounds.” In Proceedings of the International Computer Music Conference. ICMC, 2006.

Murail, Tristan. “Spectra and Pixies.” Contemporary Music Review 1, 1: (1984) 157–170.

. Pour adoucir le cours du temps. Editions Henry Lemoine, 2005.

Nuttall, Albert H. “Some Windows with Very Good Sidelobe Behavior.” IEEE Transactions
on Acoustics, Speech, and Signal Processing 29, 1: (1981) 84–91.
References 104

Pampin, Juan. “ATS: A System for Sound Analysis Transformation and Synthesis Based
on a Sinusoidal plus Crtitical-Band Noise Model and Psychoacoustics.” In Proceedings of
the International Computer Music Conference. Miami, FL: ICMC, 2004, 402–405.

Phillips, Andrew T. “A Containter for a Set of Ranges.” Dr. Dobb’s Journal 17, 6: (1999)
75–80. http://www.ddj.com/cpp/184403660?pgno=1.

Plante, F., G. Meyer, and W. A. Ainsworth. “Improvement of Speech Spectrogram Accuracy


by the Method of Reassignment.” IEEE Transactions on Speech and Audio Processing 6, 3:
(1998) 282–286.

Puckette, Miller. “Phase-locked vocoder.” In IEEE ASSP Workshop on Applications of Signal


Processing to Audio and Acoustics. IEEE, 1995, 222–225.

. Theory and Techniques of Electronic Music. World Scientific Press, 2007. http:
//crca.ucsd.edu/~msp/techniques/v0.11/book.pdf. Online edition draft, December
30, 2006.

Roads, Curtis. Microsound. Cambridge, MA: The MIT Press, 2001.

Robel, Axel. “A New Approach to Transient Preservation in the Phase Vocoder.” In


Proceedings of the 6th International Conference on Digital Audio Effects. London, UK: DAFx-
03, 2003.

Robel, Axel, and Xavier Rodet. “Efficient Spectral Envelope Estimation and its Application
to Pitch Shifting and Envelope Preservation.” In Proceedings of the 8th International
Conference on Digital Audio Effects. DAFx-05, 2005.

Rodet, Xavier, and Philippe Depalle. “Spectral Envelopes and Inverse FFT Synthesis.” In
Proceedings of the 93rd Audio Engineering Society Convention. AES, 1992. Preprint no. 3393.

Rodet, Xavier, and Adrien Lefèvre. “The Diphone program: New features, new synthesis
methods, and experience of musical use.” In Proceedings of the International Computer
Music Conference. Thessaloniki, Grece: ICMC, 1997, 418–421.

Sandowsky, George. “My 2nd Computer was a UNIVAC I: Some Reflections on a Career
in Progress.” Connect: Information Technology at NYU http://www.nyu.edu/its/pubs/
connect/archives/index01.html.

Schoenberg, Arnold. Theory of Harmony (Harmonielehre). Translated by Roy E. Carter.


Berkeley, CA: University of California Press, 1983. First published 1922.

Schwarz, Diemo. Spectral Envelopes in Sound Analysis and Synthesis. Master’s thesis, Institut
für Informatik, Stuttgart / IRCAM, 1998.
References 105

. “Current Research in Concatenative Sound Synthesis.” In Proceedings of the


International Computer Music Conference. Barcelona, Spain: ICMC, 2005.

Serra, Xavier. A System for Sound Analysis/Transformation/Synthesis based on a Deterministic


plus Stochastic Decomposition. Ph.D. thesis, Stanford University, Stanford, CA, 1989.

Shneiderman, Ben. “Direct Manipulation: A Step Beyond Programming Languages.” IEEE


Computer 16, 8: (1983) 57–69.

Smith, Julius O. “Viewpoints on the History of Digital Synthesis.”, 1992, Accessed July 5
2007. http://ccrma.stanford.edu/~jos/kna/kna.pdf.

Smith, Julius O. “Virtual Acoustic Musical Instruments: Review and Update.” Journal of
New Music Research 33, 3: (2004) 283–304.

Smith, Julius O., and Xavier Serra. “PARSHL: An Analysis/Synthesis Program for Non-
Harmonic Sounds Based on a Sinusoidal Representation.” In Proceedings of the In-
ternational Computer Music Conference. Champaign-Urbana, IL: ICMC, 1987, 290–297.
http://ccrma.stanford.edu/~jos/parshl/parshl.pdf.

Stilson, Timothy. Efficiently-Variable Non-Oversampled Algorithms in Virtual-Analog Music


Synthesis. Ph.D. thesis, Stanford University, Stanford, CA, 2006. http://ccrma.stanford.
edu/~stilti/papers/TimStilsonPhDThesis2006.pdf.

Taube, Heinrich. “λGTK: A Portable Graphics Layer for Common Music.” In Proceedings
of the International Computer Music Conference. Barcelona, Spain: ICMC, 2005.

Terhardt, Ernst, Gerhard Stoll, and Manfred Seewann. “Pitch of complex signals according
to virtual-pitch theory: Tests, examples, and predictions.” Journal of the Acoustical Society
of America 71, 3: (1982) 671–678.

Välimäki, Vesa, and Antti Huovilainen. “Oscillator and Filter Algorithms for Virtual
Analog Synthesis.” Computer Music Journal 20, 2: (2006) 19–31.

Varèse, Edgard, and Chou Wen-chung. “The Liberation of Sound.” Perspectives of New
Music 5, 1: (1966) 11–19.

Wishart, Trevor. “Computer Sound Transformation: A personal perspective from the U.K.”,
2000, Accessed August 26, 2008. http://www.composersdesktop.com/trnsform.htm.

Wright, Matthew, Amar Chaudhary, Adrian Freed, Sami Khoury, and David Wessel. “Au-
dio Applications of the Sound Description Interchange Format Standard.” In Proceedings
of the 107th Audio Engineering Society Convention. AES, 1999a. Preprint no. 5032.
References 106

Wright, Matthew, Richard Dudas, Sami Khoury, Raymond Wang, and David Zicarelli.
“Supporting the Sound Description Interchange Format in the Max/MSP Environment.”
In Proceedings of the International Computer Music Conference. ICMC, 1999b.

Zbyszynski, Michael, Matthew Wright, and Edmund Campion. “Design and Implemen-
tation of CNMAT’s Pedagogical Software.” In Proceedings of the International Computer
Music Conference. ICMC, 2007.

Zeitlin, Vadim. “The wxWindows Cross-Platform Framework.” Dr. Dobb’s Journal 24, 3:
(2001) 106–112.
107

Appendix A. Tear of the Clouds score

Tear of the Clouds is a composition for 13 instruments and electronic sounds. It was first

performed May 6, 2008 by the Manhattan Sinfonietta, Jeffrey Milarsky, conductor and

music director. In performance, the ensemble includes a MIDI keyboard that functions in a

dual role: as a traditional keyboard that controls a variety of sampled and quasi-synthetic

pitch sounds, and as a trigger for pre-synthesized sonic complexes. The tuning of the

keyboard is dynamic and adjusts to the prevailing non-tempered harmony. Pre-synthesized

sonic events were realized using a combination of SPEAR, Common Music, and CLM.

Realtime synthesis and sample playback was done using SuperCollider.

A.1 Program Note

As the highest lake in the Adirondacks, Lake Tear of the Clouds is the Hudson River’s

source. The piece was inspired in large part by my experiences and observations living

along this waterway — the vast wide expanse at Tappan Zee, the churning gray whitecaps

of a stormy day, the vistas from the New Jersey Palisades, and an imaginary slow motion

dive from the top of the cliffs into the murky depths. Some of the sonic materials used in

both the ensemble and electronics are derived from recordings of water sounds made at

various points along the river. Other sound spectra are derived from sources such as bells,

tam-tams, and voices. In many instances these inner structure of these sounds are retuned

via a process of analysis and resynthesis.


A. Tear of the Clouds score 108

Tear of the Clouds


Duration — 10'54"

Instrumentation

Flute doubling Piccolo


Oboe
B b Clarinet doubling B b Bass Clarinet
Bassoon

Horn
Trumpet
Trombone (with F trigger)

Percussion
crotales (upper octave, written C5–C6)
vibraphone (with motor)
marimba (4 1/3 octave, A3–C7) sandpaper
congas blocks sus. cym. temple blocks
temple blocks (set of 5)
sandpaper blocks
(one block mounted, the other free and
playable with one hand) bass dr. bongos tam-tam Chinese
Chinese cymbal cym.
medium suspended cymbal
2 bongos
2 conga drums, high and low (or toms)
large tam-tam
bass drum

Computer / electronic sounds (see explanation below)


Keyboard
88-key MIDI keyboard with sustain pedal, connected to computer

Violin
Viola
Cello
Contrbass

All instruments sound the written pitch with the following standard exceptions:

Piccolo — 1 octave higher


Crotales — 2 octaves higher
Contrabass — 1 octave lower

Accidentals Strings

B quarter-tone flat exaggerated bow pressure creating a low,


Bb three-quarter-tones flat noisy, “crushed” sound

µ quarter-tone sharp resume normal bow pressure

 three-quarter-tones sharp Note: strings should play with ordinary vibrato unless otherwise indicated

Whenever possible, woodwinds should execute


quarter tones using alternate fingerings.
Trumpet and horn should use alternate
fingerings based on the 7th and 11th partials.

Electronics

There are two types of electronic sounds indicated in the score, those than are triggered by the MIDI keyboard and those
that are triggered by the computer.

The MIDI keyboard functions much like a traditional keyboard playing a variety of sampled and quasi-synthetic pitched
sounds. A timbre description is indicated in quotation marks (e.g. “bowed pluck”). The computer operator takes care of
switching timbres at the appropriate time. The actual pitches produced, which are often microtonally inflected, are
indicated in the staves labeled “Synthesis.”

The computer operator also triggers various sonic events — chords, sustained tones, noise complexes, etc. These are also
generally indicated in the “Synthesis” staves.

Rev 3/16/08
A. Tear of the Clouds score 109
C SCORE
Tear of the Clouds
Michael KLINGBEIL

q = 66 Bw w œ
>
& ∑ Ó Œ ‰ mœ ˙. ˙. Ó
J
ø
Flute

3 4 Í 3 F 4 π F 3 2
bw w œ
∑ Ó
&4 4 µ˙> ˙ 4œ œ œ œ ˙ œ 4 4
ø 4
Oboe

Í F π π F
>
∑ ∑ Œ ‰ œJ œ œ ˙ œ œ‰ Ó ∑ ∑
take Bass Clarinet

& J
ø
B b Clarinet

Í F

Bassoon
? ∑ ∑ ∑ ∑ ∑ ∑ ∑

Horn & ∑ ∑ ∑ ∑ ∑ ∑ ∑
3 4 3 4 3 2
B>˙ . œ‰
con sord.

&4 ∑ ∑ ∑ ∑ J ∑ ∑
4 4 4
ø 4 4
C Trumpet

f
œ. ˙ œ
con sord.

Trombone
? ∑ ∑ Ó ⋲J ∑ ∑ ∑
π F
vibraphone
˙ ˙
˙
arco

Percussion & ˙ Œ Ó Œ ∑ Ó ∑ ∑
F f
° (lock pedal down)
3 c1 . 4 3 4 mµ >œœ c1
3 2
& ⋲B b œœœ .. µ µ Bœœœ
15 c6 c7

m B œœ .. mœ B œœ
& 4 ⋲ µ œœ .. mœ  œœ Bœœ
J 4 µœœ 4 Œ b œœ 4 4 4
p p
Synthesis

f
? Œ
œ µœ

Computer & ∑ ∑ ∑ ∑ ∑ ∑ ∑

b m œœœœ
“bells”

∑ ∑ ∑ ∑ Œ Ó ∑ ∑
“big pluck”

&
15

œ
& œ mœ . Ó ∑ ∑ ∑ ∑ ∑ ∑
Keyboard
f F
? ∑ ∑ ∑ ∑ ∑ ∑ ∑

bw w œ
o
q = 66
o o B>œ œ . ˙ œ
& ˙. w œ Œ Ó
ø ø
Violin 5

3p 4 F 3 f p 4 π 3 2
> Bw w œ
B ∑ ∑ & 4 Œ ÂœJ ˙ ˙ œ Ó
4 4 4 4
ø 4
Viola

3
Í p
≤ ≥ ≤ w w œ
B n>œ µ>œ . ˙ w ˙. ˙ œ
Ó
ø
Cello

f p F p
>o
sul G (sounding
ȯ . o o œo o
?
gliss. harm.

∑ Ó &˙ œ œ r‰ . Œ Ó ∑ ∑
as written)

œ
Contrabass

Í F f

© Copyright 2008 Michael Klingbeil


3/16/08
A. Tear of the Clouds score 110

2 Tear of the Clouds

q = 84 œœ
mw w mw w mœ œ œ µ œ œ Bœ
œ µœ µ œ . œ ‰ ‰ Âœ ˙
⋲ œ µœ
8

& ∑
Fl.
J
2 4 p 2 6 5 ƒ p p 3
œ œ
8
w w w w œ œ Âœ µ œ œ µ œ  œ mœ B œ œ . ⋲ ‰ nœ ˙
&4 ∑ œ
4 4 4 J 4
Ob.

p F ƒ p
5 6

Cl. & ∑ ∑ ∑ ∑ ∑

B˙ . ˙ œ œ ‰. n˘œ mœ œ œ œ
? B Bb œ J
8

Bsn. ∑ Œ R ‰ Ó ‰
π P p p f p F
3

8 con sord.

& ∑ ∑ ∑ ∑ Ó ‰ 3

œ œ œ œ
Hn.

2 4 2 5 3
p F
8

∑ ∑ ∑ ∑ Ó ‰ bœ œ œ œ
(con sord.)

Tpt. &4 4 4 4 4
p3 F
?
8

Tbn. ∑ ∑ ∑ ∑ ∑

medium Chinese cym.


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ lowj conga
sus.cym.
~~~~~~
⋲ œj . œj ‰ Ó
low bongo
œ̊
8

∑ ÷ ‰ œ ˙ ‰ œ̆ ‰
yarn

& J
w œ œ J
fl
Perc

π π P F p
4 µœœ
c2
2 2 5 3
B œœœ
8

& B Bœœ
15

eeeeeeeeeeeeeeeeeeeeeeeee µ B œœœ
4 Bµ œœœ
&4 m
Synth 4 Bb œœ 4 4
?

Comp. & ∑ ∑ ∑ ∑ ∑

& ∑ ∑ ∑ ∑ œœ ∑
15

œ
œ œ mœ
& ∑ ∑ ∑ ∑ œœ œ Ó.
p
Kbd.

f
? ∑ ∑ ∑ ∑ ∑

q = 84
Âw µw Âw µw Âœ µœ µœ
œ µœ µœ µœ
⋲ œ Bœ m œ Bœ m œ œ œ œj n Oœ Ȯ
8

& ∑
Vn.
œ J
2 4 p 2 F 6 5 f 7 p 3
µw µw µ œ Âœ µœ œ œ œ ‰ . BBOœ Oœ
pizz. arco

&4 ∑ Âw Âw j ‰ ‰ Âœ œ œ œ
Vla.
4 4 œ 4 R 4
p F f P3 f p
mO O
B & ⋲ œJ . Bœ œ Bœ œ B œ œ Bœ œ Bœ œ Bœ œ Bœ œ . Bœ . ‰ ? m œœ œœ œœ œœ ‰ m œJ œ
3 7

∑ ∑ ‰
ø
Vc.

F p P3 f p
.
& ∑ ∑ Œ œ œ œ æ ?˙ œ
ø mœ
Cb.

3/16/08
A. Tear of the Clouds score 111

Tear of the Clouds 3

A jet
q = 60
j
whistle

˙ bœ .
13 flz.

& ⋲ Œ Ó ∑ Ó Œ ‰ µœ ˙ œ œ ˙.
æ æJ mœ
ø
Fl.

3 F 4 ƒ 2 4 3 F 4
˙ œ œ œ œ
13

∑ ∑ ∑ ∑
&4 4Œ
4 4 4
ø 4
Ob.

F
œ œ µœ
Bass Clarinet

? mœ nœ m œ œj ‰ Ó
13

& ∑ ∑ ∑ ∑ Œ
ø

Cl.

fl
ƒ
6

B ˙ œ ? µœ mœ nœ œ
m œ b œ œj ‰ Ó
13

J ‰ ∑ ∑ ∑ ∑
ø
Bsn.

fl
ƒ
6

13 senza sord.

& j ‰ ∑ ∑ Œ
m˙ . ˙.
ø 3œ F ø
œ œ œ
3˙ ø
œ
Hn.

4 2 4 4
13 senza sord.

&4˙ œ ‰ ∑ ∑ ∑
4Œ œ ˙
J
ø ø̇
Tpt.
4 4 4 4
F
?
13 senza sord.

Tbn. ∑ j ‰ Œ ∑ ∑ ∑
Sf >˙ œ œ ˙
p p
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
j
13

÷ ∑ ‰ Œ Ó ∑ ∑
bass drum

Perc
œ . ˙.
fl π̇ F
f
13
3 4 2 4 3 4
Bµ µ œœœ
c3 / c0

&
15

&4 Bœœœœ
4 4 4 4 4
ƒ
Synth

? mÂm œœœ
voice

œ
13

Comp. & ∑ ∑ ∑ ∑ ∑ ∑

∑ ∑ ∑ ∑ ∑ ∑
“big pluck”

&
15

& ∑ ∑ ∑ ∑ ∑ ∑
ƒ
Kbd.

? ∑ Œ ∑ ∑ ∑ ∑
˙.
A °
13
Oœ Oœ Oœ Oœ q = 60

Vn. & J J J ‰ Œ Ó ∑ ∑ ∑ ∑
3 f 4 2 4 3 4
O O m Oœ Oœ
⋲ µ œj . œ
&4œ œ J J J ‰ Œ Ó ∑
4Ó 4œ œ œ œ ˙.
ø
Vla.
4 4 4
f F
Oœ Oœ j
? B Bœœ œœ ‰ Œ Ó ∑ ∑ Ó ‰ Âœ ˙ .
J J J
ø
Vc.

? ≥
Cb. œ œ œ œ ˙ ˙ w ˙. ˙.
f ƒ F f

3/16/08
A. Tear of the Clouds score 112

4 Tear of the Clouds

>
B˙ ˙ w œ bœ ˙ . ˙ œ nœ œ µœ
J J
19

& œ Œ
ø ø
Fl.

4 F p f
3 3

4 2
19
œ µ˙ œ œ ˙ œ ˙. œ œ mœ . œ œ. µœ ˙ œ œ. œ m>œ
&4 J ‰ 4 ‰
ø ø 4
ø ø
Ob.

F F f
3

? œ ˙ œ. mœ ˙ . œ >œ
19

Œ Â˙ . œ
œ . ˙
ø ø
Bs. Cl.

Ḟ f

?
19

bw ˙ w œ bœ ˙ . ˙. œ œ
Bsn.

> ƒ p
π
j j j
19

& ‰ Œ 3
j j
3
œ
Bb œ ˙ œ œ ˙ œ œ.
ø ø ø
Bœ œ œ
3

œ ˙. œ œ Bb œ œ œ. >
Hn.

4 F 2 4 F f
19

œ ˙ œ œ ˙ œ. œ ˙ Bœ . >œ
&4 œ ‰ Bœ ˙
4˙ 4 œ ‰
ø J
ø ø ø
J J
Tpt.

F F f
? r‰.
19

‰ j
m>˙
ø
bw ˙ ˙. œ µœ ˙ œ ˙
Tbn.

> ƒ
f
19

÷ Œ Ó ∑ ∑ ∑ ∑
tam-tam
Perc
œ
f
19
4 c1
2 4 c2 c3 c4 c5 c6

&
15

&4 4 4
Synth

? ‰ j Œ Œ ‰. r Ó j Œ j
bœ Bœ µœ mœ
œ œ
19

Comp. & ∑ ∑ ∑ ∑ ∑

& ∑ ∑ ∑ ∑ ∑
15

& ∑ ∑ ∑ ∑ ∑
f
Kbd.

? Œ Ó ∑ Œ Ó ‰ j Œ Œ ‰. r Ó j Œ j
bœ œ mœ mœ
> > > œ œ
b˙ ˙ œ m˙ . œ œ ˙ œ. mœ ˙ œ. Bœ œ >œ
19
˙ ˙ œ ˙. œ
& Ó
ø
Vn.

4 2F 4 π P f

&4 œ œ
mœ Âœ
˙
˙
œ œ œ œ µ Âœœ ˙˙ œœ .. B µ œœ ˙˙ œ.
œ.
µœ œ .
Bœ œ .
µœ

œ.
œ.
>
µœ
Vla.
4œ œ 4 œ œ J J
π F π F f

? Âw
w ˙ w ‰ j
µœ ˙ . ˙ œ œ œ
Vc.

f f ƒ f P
? j
w ˙ Bw ˙. œ. œ ˙ œ. œ
> > >
Cb.

ƒ >

3/16/08
A. Tear of the Clouds score 113

Tear of the Clouds 5

œ mœ œ œ >˙ œ µœ œ
q = 76
µœ œ œ
B>œ ˙ œ µœ œ œ œ œ. J bœ m œ Âœ Âœ j
24

& œ œ.
ø >
Fl.

3 p f 3 4 f 3

œ µœ œ œ œ. mœ œ µœ œ
⋲ b œ b œ  œ bœ n œ µ œ œ Œ m œj µ œ m œ
7

.
24

œ œ bœ œ ˙ 4 ‰ R Ó
3

&
>
ø 8 4 œ >
Ob.

F f 4:3

? œ œ œ œ µ>œ œ œ œ ‰ Œ >
Œ µ œJ œ œ œ µ œ œ bœ
24

J Œ ∑ ∑ Œ
ø
Bs. Cl.

ƒ
3
3

? µ˙ œ. m˚œ
24

œ ˙ œ ∑ Œ Ó
œ bœ . >
ø
Bsn.

f f ƒ
Bb˚œ œ .
24

Œ Œ
3

& œ Bbœ œ . œ J œ œ
œ Bb œ œ œ 3 œ 3 œ. 4 ̇f >
Hn.

> p ḟ
b˚˙ . >œ
& œ bœ œ œ̊ œ œ
24

Œ Œ
œ >œ ˙ 4 mœ œ œ œ 8 œ.
ø J
Tpt.
4
F f
Â˚œ
3

µ˙ œ œ. œ
? ˚
. Œ Bœ œ œ
24

⋲ Âœ . œ‰ Ó Œ
œ œ
ø J
Tbn.

f p f f 3

> >x xxxx >x xxxx⋲ . ~~~~~~~~~


sus. cym.

x xxxx ® ‰ Œ
temple blocks

‰.
24

÷ Œ ∑ ∑ ‰ Œ Œ Œ Œ xxx x x ‰
6 3

œ
>
Perc

f ƒ
7
f p
24
3 3 4
r
c7 c8 c9 c10 c11 c12 c13 c14 c15

&
15

œ Bœœ
. B œœ œ
&Œ ‰ µ œ 4 b œœ Œ œ
Synth µ  œœœ 4 8
bœœ µœ
?‰ j ⋲ j r µœ Bœ œ Œ mœ
j ‰.
œ
µ œ Âœ œ œ
R
24
œ 3

Comp. & ∑ ∑ ∑ ∑ ∑

& ∑ ∑ ∑ ∑ ∑
15

‰. b œœ œœ ggg b œœœœ Ó.
&Œ œ œ Œ ∑ ∑ ggg œ ∑
R ggg
ggg ƒ
Kbd.

ggg bœœ mœ œ
?‰ j gg œ œ mœ œ œ
m œ œ Âœ œ œ
Œ ∑ ∑ Œ Œ Œ j ‰. R
* œ
° *° *° 3

œ m˙ œ. œ œ
Bœ œ m µ œœ œ̊ ˙
q = 76
˙ œ. B>˙ œ. œ œ œ µœ µœ œ . œ œ
œ œ
24

& Œ
ø œ J
Vn.

ƒ π 3 3F 4 ƒ 3

˚
5 r
5:3 œ

mœ œ
B œ µœ œ mœ Âœ ˙ . ⋲ ⋲ BBœœ .. œœ œœ ..
& 4 Ó. mœ œ j œ mœ µœ
mœ œ œ
4 œ Œ 3

ø
Vla.
8 µœ œ
ƒ π f
>˙ . œ
? µœ œ. œ œ œ. Œ Œ Bœ œ Bœ œ
j‰ &
œ µœ œ œ œ J œ
Vc.

f p f ƒ
3
3

o œo ? Œ
(sounding
as written)

? Ó Œ &œ Œ
sul G
j j
3

œ œ œ œ œ œ œ œ œ œ
ø œ œ
Cb.

Ï ƒ

3/16/08
A. Tear of the Clouds score 114

6 Tear of the Clouds

B q = 69
b˙ ˙ œ
œ œ. bœ nœ mœ œ
29
Âœ œ m œ
Fl. & Œ ∑
Ï
3
2 3 4
œ œ m œ b˙ ˙ œ
œ. œ µœ œ œ
29

& œ Œ ∑
Ob.
4 4 4
Ï
˙ œ
7

? µœ nœ œœœ
œ œ B˘œ m˘œ œ̆ m˘œ ˘ mœ œ mœ µ œ bœ n œ bœ b œ n œ µ œ œ œ bœ bœ n œ µœ œ
29 7


7 7

œ. œ bœ bœ bœ n œ b œ b œ n œ
œœœ œ bœ
3

œ bœ œ
Bs. Cl.

> >
9
Ï 5 π
5

? mœ œ mœ
29

∑ Œ bœ j ‰
fl fl fl fl flœ >˙ ˙ œ
ø
Bsn.

f Ï
r
& œ ‰.
29

‰ Bb ˙ ˙ œ Œ ∑
3

œ
>
Hn.

ƒ 2 3 4
bœ . n˙ ˙ œ
& ⋲ œJ . œ
29

bœ Œ ∑
Tpt.
4 4 4
ƒ

3

˙
? Œ
29

‰ ∑
bœ œ ˙.
Tbn.

f p >
3

f p
~~~~~~~~~~~~ j >œChinese cym.
Ͼ Ͼ Ͼ
29
hi conga

÷ œ Œ ∑ Ó
3 low conga
œ œœ
J >
Perc

f S π

œ
c18 c1

µ mµ œœœœ
29 c16 c17
2 3 4
&
15

mœ Œ
& Œ ‰
3
œ t Œ bœ
œ 4 4 4
j p
Synth

? ⋲ µœ . t Œ œ ‰. r
bœ ‰. R Bœ œ œ
29

Comp. & ∑ ∑ ∑ ∑

mœ œ Œ
7

& Ó œœ ∑ ∑ ∑
15 “bowed pluck”

mœ œ
bœ mœ œ
& ⋲ j m œ œ bœ ∑ ∑ ∑
œ.
p
Kbd.
5
f
? ∑ ∑ t Œ ‰. œ ‰. r
R œ œ œ

* B q = 69 °

œ bœ µ˙ µ˙ œ.
œ. œ mœ µœ mœ æ J
29

& ⋲ m ⋲ Œ ∑
Vn.
J 6
ƒ 2 3 4
bœ . œ µ˙ µ˙ œ.
pizz.
˝
Vla. & ⋲ J æ 4 J B⋲ œ
4 ∑
4
Ï ƒ

˙ œ. ?⋲ ˝
pizz.

& ⋲ µ œj . j
æ̇ ∑
3

œ Âœ J mœ
Vc.

ƒ ƒ
> > > > >
? ∑ ∑ œ œ mœ µœ mœ µœ œ
3

> Â>œ >œ œ ˙


Cb.

f 5 p

3/16/08
A. Tear of the Clouds score 115

Tear of the Clouds 7

33

Fl. & ∑ ∑ ∑
4
33

Ob. &4 ∑ ∑ ∑

œ œ œ µœ œ mœ bœ nœ
7

? œ œ œ mœ
7

b œ œ b œ b œ b œ bœ b œ b œ œ µ œ œ b œ bœ n œ µ œ b œ b œ . b œ b œ bœ n œ b œ bœ n œ µ œ œ œ b œ bœ œ œ
33 6 5 3

bœ œ bœ b œ b œ nœ bœ bœ nœ
7:6 3
Bs. Cl.

6

œ.
5

? œ œ.. œ œ œ œ
33 5

Ó
3

bœ µœ bœ µœ œ
5

J mœ . mœ mœ œ bœ .
5
Bsn.
mœ œ
π
33

Hn. & ∑ ∑ ∑
4
33

Tpt. &4 ∑ ∑ ∑

?
33

Œ ∑ ∑
˙.
ø
Tbn.

33
B.D. beater
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
tam-tam l.v.

÷ ∑ Œ .
ø̇ w
Perc

p
33
4
&
15

t
4
mœ œ . .
Synth

µœ œ bœ .  œ œ µ œ bœ œ
t µœ œ µ œ µ œ mœ œ µ œ b œ œ µ œ µ œ b œ µœ µ œ m œ Œ
œ Bœ œ µœ œ j
3

Âœ µœ µœ Bœ
3 3
7 6 6 bœ œ
33

Comp. & ∑ ∑ ∑

& ∑ ∑ ∑
15

& ∑ ∑ ∑
Kbd.

œ bœ . mœ œ . . nœ œ mœ
t œ
mœ bœ œ m œ œ mœ œ œ m œ b œ œ m œ m œ b œ mœ œ m œ Œ
œ œ œ mœ œ m œ œ b œ œj
3

œ mœ 3 3


7 6 6

33

Vn. & ∑ ∑ ∑
4
arco

B Ó Œ ⋲ j Œ ∑
ø ø
4 µœ . œ œ ˙
Vla.

F
Ÿ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
sul pont.

? j
arco

‰ j j ‰ Œ ‰ œ (µœ ) ˙
bœ ˙ . ˙.
ø œ
ø
Vc.

F π
? Œ ‰ mœ œ œ œ œ œ Œ Ó
J
ø ø ø
Cb.
˙.
F

3/16/08
A. Tear of the Clouds score 116

8 Tear of the Clouds

36

Fl. & ∑ ∑ ∑

36

Ob. & ∑ ∑ ∑

œ Bœ œ œ
3

? µœ œ µœ œ µœ œ b œ œ œ b œ bœ b œ b œ n œ µ œ œ œ œ
36 5 5

µœ
7

bœ œ b œ bœ b œ n œ b œ n œ bœ
3

bœ bœ
3


Bs. Cl.

p
7 poco cresc…

œµœ œ µœ œœ œ œ œ mœ œ
5 7 3

? œœ œœ œ mœ µœ
36 5

œ b œ m œ mœ bœ n œ m œ nœ bœ mœ œ b œ mœ n œ m œ b œ b œ m œ œ m œ b œ n œ œ n œ m œ b œ n œ œ bœ bœ œ
6
Bsn.
mœ mœ œ m œ œ bœ œ m œ
p 6:5
7
5

36

Hn. & ∑ ∑ ∑

36

Tpt. & ∑ ∑ ∑

?
36

Tbn. ∑ ∑ ∑

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
˙. œ. œ̆ œ œ b œ œ bœ
medium marimba

?mallets œ œ ˙
36

Perc ÷ Ó Œ ‰ J bœ
F p F 6

ø
36

&
15

t bw w
µ œ œ œ Bœ .
Synth

œ
t ⋲ œ mœ bœ µœ ? ‰.. rK œ œ œ Bœ œ µ œ œ œ Bœ
3 5

‰ œ µœ µœ bœ µœ µ œ µ œ œ bœ µ œ µ œ
7

Bœ µœ mœ nœ œ bœ
µ µœ œ . mœ µœ œ œ œ
5
5
3 5
5
36

Comp. & ∑ ∑ ∑

& ∑ ∑ ∑
15

& ∑ ∑ ∑
Kbd.

œ mœ œ œ œ .
t ⋲ œ mœ bœ mœ ? ‰.. rK œ œ œ œ œ mœ œ œ mœ m œ œ bœ œ œ œ
3 5

‰ œ mœ bœ mœ mœ œ nœ œ bœ mœ
7

œ m mœ œ . mœ œ œ œ œ
5


5
3 5
5

36

Vn. & ∑ ∑ ∑

Vla. B ∑ ∑ ∑

~~~~~~~~ Ÿ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Ÿ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


ord. sul pont.

œ µœ µœ m ˙ .(œ )
?
œ
5
Bœ œ . Œ µ ˙ (œ ) œ bœ œ œ µœ
œ mœ
µœ . bœ bœ
Vc.

P p 3
F
œ. œ µœ µœ
? Ó ⋲ J œ bœ µ œ
µœ œ œ mœ mœ nœ j œ bœ m œ µœ
5

œ œ mœ œ .
7
5

Bœ œ œ. Bœ . œ œ.
Cb.

P P
3

3/16/08
A. Tear of the Clouds score 117

Tear of the Clouds 9

39

Fl. & ∑ ∑ ∑

39

Ob. & ∑ ∑ ∑

bœ œ œ œ
? œ œ œ œ œ Bœ œ œ œ œ bœ œ bœ œ bœ . œ. bœ
œ bœ bœ µœ œ œ µœ nœ œ bœ œ
39

Bs. Cl. œ œ bœ
3
P 6 f p subito
œ
? œ œ nœ mœ œ nœ œ œ œ œœœ m œ b œ m œ m œ m œ bœ œ  œ
39 5 6

µœ œ n œ œ bœ m œ n œ m œ œ bœ m œ mœ n œ m œ œ b œ m œ m œ
6

œ œ bœ mœ nœ œ bœ
6

œ
5

œ œ. bœ mœ œ œ œ mœ
Bsn.

f p subito
6
3 cresc… 6
6

39

Hn. & ∑ ∑ ∑

39

Tpt. & ∑ ∑ ∑

?
39

Tbn. ∑ ∑ ∑

œ bœ #œ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
? œ #œ œ bœ œ œ œ œ œ.
œ bœ ‰. r
39

‰ J ‰ J bœ Œ œœ ˙˙ ..
ø
Perc

5 3
p π f p
39 c2 / 12

&
15

t w w
w
π
Synth

? ‰. œ œ
R
œ µœ . œ. œ œ t ? mœ
5

bœ µœ bœ
µœ œ .
39

Comp. & ∑ ∑ ∑

& ∑ ∑ ∑
15

& ∑ ∑ ∑
Kbd.

? ‰. œ œ
R
œ mœ . œ. œ œ t ? mœ
5

bœ œ bœ
mœ œ ˙.
*° *°
39

Vn. & ∑ ∑ ∑

n œ µ œ µ œ µ œ bœ œ Ÿ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
3

œ bœ ⋲
sul pont.

Bœ œ
5

B ‰. r œ œ Bœ . œ Bœ œ
µœ œ œ Âœ œ Bœ µ w (b œ )
ø
Vla.
3

Í 5
5 f
~~~~~~~~~~~~~~~~~~
ord.

? ˙
sul pont.

Ó ∑ Œ Â˙ .
ø ø
Vc.

œ
? bœ µœ Âœ œ œ
⋲ µ œj .
pizz.

J ‰ Œ ∑ Œ Œ
œ
Cb.

p F

3/16/08
A. Tear of the Clouds score 118

10 Tear of the Clouds

42
Âœ ˙. œ œ ˙ ˙
& ∑ ‰ J
ø
Fl.

3 p 2 3
b˙ . œ ˙ œ œ
42

& ∑ Œ
ø 4 4 4
Ob.

P
bœ bœ Bœ œ . œ
bœ bœbœ
? bœ bœ bœ bœ nœ bœ .
6

œ œ œ b œ œ b œ œ œ b œ bœ bœ œ œ mœ œ œ bœ bœ œ mœ œ bœ bœ b œ
42 3

bœ œ.
5 5 5

Bs. Cl. œ bœ œ bœ .
3 3

œ µœ .
3

œ
7

œ bœ œ mœ œ Âœ bœ b œ œ bœ bœ
? bœ bœ nœ œ mœ
42

mœ mœ mœ bœ œ œ œ Œ bœ œ
3

œ œ
Bsn.

P
3
5

42

& ∑ Ó

ø
Bb ˙ 3 œ œ œ œ 2 ˙
Hn.

3
42
p
& ∑ ∑ ∑ ∑
Tpt.
4 4 4

?
42 con sord.

∑ ∑ Â˙ . ˙
ø
Tbn.

p
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
? ‰. r
42

∑ ∑ ÷
tam-tam

Perc
œ> ˙ ˙
p π P
42
3 2 3
&
15

t
Synth w w 4 ˙. 4 ˙ 4
? Bœ bœ bœ Bœ
b œ Bœ t µ œ œ œ Bbœ
5 6

œ bœ ˙ n œ bœ m œ
œœ˙
F
5
3

42

Comp. & ∑ ∑ ∑ ∑

& ∑ ∑ ∑ ∑
15

& ∑ ∑ ∑ ∑
Kbd.

? œ bœ bœ œ t m œ œ œ bœ
5 6

œ bœ ˙ n œ bœ m œ bœ œ Ó
œœ˙
F *
5
3

42
w œ ˙ œ œ
& ∑ Bw œ ˙ œ
ø ø
Vn.

3 p 2 π 3
µ˙ ˙ œ œ ˙. ˙
ord.

B Ó µ˙ ˙ œ œ 4 ˙. &4
ø ø
Vla.
4
p π

? ∑ ∑ ∑
˙.
ø
Vc.
œ
F
r mœ . bœ .
? Œ œ bœ ‰ µœ Œ ‰. bœ Œ
R Œ œ µœ ⋲ J Ó ∑
Cb.
J 3 J
5
f

3/16/08
A. Tear of the Clouds score 119

Tear of the Clouds 11

œ µœ
œ.
46

& Ó ∑ ∑ Œ ‰ œ œ
ø J
Fl.

3 4 p f
˙
46

Œ
&4 m˙ . ˙. 4Œ m œ-  œ ˙
ø >
Ob.

Í F P
œ ˙
? œ bœ œœœ take B b Clarinet
46

œœœ œ bœ Œ ∑ ∑
ø
Bs. Cl.
œ
ƒ
F
6

mœ . œ œ mœ œ œ.
? œ Âœ œ B mœ œ œ œ J ?
46

Œ œ mœ mœ œ œ œ œ œ mœ mœ  œ ⋲ Ó
5 5

mœ . bœ œ.
Bsn.

p 3 6 π
? ‰. r
46

& Œ ∑ ∑ Ó Œ œ
3 ˙
ø
Hn.

4 Í
46

& 4 m˙ j ‰ ∑ ∑
ø œ œ ˙
øœ
Tpt.
4
p
? ‰. µœ
46

˙ Œ Ó ‰ j Œ R
ø ø
bœ ˙ œ ˙
ø
Tbn.

p Í
46 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
÷ ˙. ∑ ∑ ∑
ø
Perc

46
3 c3 / 20 4
&
15

t
4 4
p
Synth

t ⋲ j Ó. ‰. r
µœ . bœ
46

Comp. & ∑ ∑ ∑ ∑

∑ ∑ ∑ ∑
“organ pluck”

&
15

& ∑ ∑ ∑ ∑
p
Kbd.

t ⋲ j
mœ . ˙ ˙. ˙. ˙. œ. bœ

µ˙ ˙.
œ
µ µ Ȯ .. Ȯ
46

Vn. & µ BȮ


3 p 4

&4
œ B m m Ȯ Ȯ Oœ µ µ Ȯ .. Ȯ
bb Ȯ
Vla.
4
p

? ÂÂ Ȯ .. Ȯ
ord.

Œ
B˙ ˙.
Vc.

p
µœ . o
sul D (sounding o o
? ⋲ J r
sul A

Ó ∑ Œ ‰.
as written)

Cb. & ˙. ˙ œ
F p

3/16/08
A. Tear of the Clouds score 120

12 Tear of the Clouds

& æw æ̇. æ̇ ⋲ œr ‰
50

Ó Ó
tongue ram

Fl.

p 3 4 3 >O 4
ƒ

50

Ob. &w 4œ µ˙ 4 ˙ œ œ 4 ˙ 4
f F f
œ
œbœ œ bœÂœ
B b Clarinet
6

? ‰.
50

∑ ∑ Ó œœ
6:5

& œ µœ µœ œ µœ œ œ
µœ mœ œ
Bs. Cl.

p 5
p

œ œ œ œ œ µœ
? ⋲ m œ  œ µœ m œ  œ µ œ œ  œ µ œ œ œ œ œ µ œ mœ œ œ . œ œ œ mœ œ
6

‰ œ µ œ œ œ µ œ œ œ Âœ R ‰ . ‰ mœ œ œ µœ
6

R ‰.
50 3

mœ Œ
Bsn.
œœ œ
p p p
5 3
6 6 6

? ˙. mw ⋲ œ. ˙
50

Œ
Hn.
œ ˙ J
3 4 Í 3 Í 4
50

∑ ∑ ∑
& 4 ⋲ œj .
ø
Tpt.
4 4 ˙ 4
Í
? ˙.
50

bœ ˙. j
Tbn. ∑ ⋲ Âœ . ˙
Í
50

Perc ÷ ∑ ∑ ∑ ∑

50 c4 / 33
3 4 c5 / 40
3 c6 / 44
4
&
15

t
Synth 4 4 4 4
t µœ ⋲ œJ .

50

Comp. & ∑ ∑ ∑ ∑

& ∑ ∑ ∑ ∑
15

& ∑ ∑ ∑ ∑
Kbd.

t
w ˙. mw œ œ. ˙

j >
 Ȯ Oœ Ȯ µ µ Oœ Ȯ .. Oœ ‰ ⋲ BBOœ .. Oœ Oœ
50

Vn. & Oœ J
F p 3 4 F π 3 f 4
>
B Oœ Ȯ Oœ Ȯ Oœ j
Oœ ‰ m Oœ .. Oœ Oœ
Vla.
4 4 µBȮ .. 4⋲ J 4
F p F π f
>.
? Oœ ‰ bb Ȯ Oœ Ȯ .. Âw ⋲ bb Oœ . Ȯ
Vc.
J J
F p Í f p
o j
? ⋲ m œJ . œ Œ
pizz.

∑ ∑ ‰.
3

Cb. &w R œ
ƒ F

3/16/08
A. Tear of the Clouds score 121

Tear of the Clouds 13

œ ˙
µœ mœ J Âœ œ . œ
54 3

Fl. & Œ m˙ . J nœ œ J œ œ
4 p 2F 3 5 f 4 2
. œ œ ˙ œ Bœ
œ Âœ µœ œ œ. œ
7

&4 œ mœ J µ œ œ Bœ
54

Ob.
4 8 4 mœ nœ œ m˙ . 4
ƒ
5 5

bœ œ b œ  œ b œ œ . n œ bœ Ÿ~~~~~~~~~~~~~~ 3
r ‰.
54

∑ ∑ Œ ‰
7

& œBb œ j
bœ bœ Âœ
µœ œ œ œ
Cl.

œ b œ(nœ ) œ
f p f
5

œ µœ œ mœ œ œ œ œ
œ µœ œ œ œ
6

? œ œ œ Bœ µ œµ œ œ J ˙ œ
54 3

Ó ‰ ‰ J J
Bsn.
œ.
F f p F
3 3
3

? œ
54

Œ Ó ∑ ∑ ∑
ø
Hn.

4 2 5 4 2
54

∑ ∑ ∑
&4 4 8 4 Œ bœ . ˙ 4
>œ >
Tpt.

f
?
54

˙ Ó ∑ ∑ ∑
ø
Tbn.

54

Perc ÷ ∑ ∑ ∑ ∑

54
4 c7 / 48 2 5 4 c8 / 52
2
&
15

t b µ œœ
& 4 µ Bœœœœ Œ Ó
Synth 4 4 8 bœ 4
t ?œ mœ
Œ
54

Comp. & ∑ ∑ ∑ ∑

& ∑ ∑ ∑ ∑
15

& ∑ ∑ ∑ ∑
Kbd.

t œ ?˙ . ˙ œ œ. mw

Oœ  Ȯ .. Oœ Oœ m O Oœ ..
>
Oœ n Oœ ⋲ m wO
54

& mœ
ø
Vn.

4P 2 f 5 4 ƒ 2
>
Oœ Oœ b Oœ Oœ .. Oœ µ µ Oœ
BBȮ .. Oœ sul pont.

B ⋲ ‰.
4 ΠΠr
4
ø 4 8 mœ œ 4
Vla.

P f Í

? Oœ O. O O µ Oœ Oœ .. Oœ
sul D

‰ Œ µ˙ .
ø J
Vc.

F F
O. O O K ˙o .
arco (sounding
arco sul D sul G as written)

? ⋲ œj . œ
pizz.

Cb.
J ‰ ∑ &
p p f F

3/16/08
A. Tear of the Clouds score 122

14 Tear of the Clouds

b>œ µ>œ
q = 80
58
˙
& ∑ µœ œ œ œ œ œ
œ œ œ
> >
Fl.

2 5 3 P 3 2 3
œ œ mœ œ µœ œ bœ µœ mœ mœ m œ mœ œ œ ˙ œ œ µœ
mœ œ œ µ œj j J
58

Ob. &4œ 3 8 8 œ œ. 4 4 8
f 5
7
ƒ
œ œ bœ œ œ
mœ µœ µœ mœ µœ Œ.
58

Cl. & œ b œ bœ 6
∑ ∑ ∑
6
p
?
58

∑ ∑ ∑ ∑
Bsn.
˙
p
?
58

Hn. ∑ ∑ ∑ ∑ ∑
2 5 3 3 2 3
œ Bœ
œ mœ œ
58

j œ œ
Tpt. &4
˙ 8 mœ œ. 8 œ 4 œ 4 8
F P f
3

?
58

Tbn. ∑ ∑ ∑ ∑ ∑

58

Perc ÷ ∑ ∑ ∑ ∑ ∑

2 5 3 mœœœ
c9 / 60
3 j 2
c10 / 68
3
bœœ
58

& m Bµ œœœ ‰
µ µB œœœœ
15

m µ µ œœœœ Œ. m œœ ‰ B mœœ
Synth
&4 8  œœ 8 4 Ó 4 8
? Bœ . ‰ bœj

58

Comp. & ∑ ∑ ∑ ∑ ∑

& ∑ ∑ ∑ ∑ ∑
15

& ∑ ∑ ∑ ∑ ∑
Kbd.

? ˙ œ œ. œ. ˙ œ bœ ˙

Âœ Bœ .
µœ œ mœ
q = 80

œ µœ µœ mœ
58

& ‰ œ œ. ∑ ∑
J
Vn.


3

2 5 f 5
3 3 2 3
œ œ.
œ µœ Âœ Œ .
ord.

B bœ ∑ ∑
3

Vla.
4 œ. œ œ œ 8 &8 4 4 8
cresc… f fÍ
gliss. harm.

? Œ ? mO . ∑ ∑
sul D

Vc.
˙ & O.
f f p
ȯ o
gliss. harm.

? ‰ bœj
pizz.

Œ ∑ Ó ∑
sul G

&
œ.
Cb.

f f

3/16/08
A. Tear of the Clouds score 123

Tear of the Clouds 15

œ̆ œ. œ
œ bœ Bœ œ µ>œ >œ b œ µ œ
œ. µœ mœ œ œ nœ
63

Fl. & ‰ Œ ∑
f f
3
3 2 4 7 3 2
3
3

œ. œ mœ mœ œ µœ œ ˙ Ÿ~~~~~~~~~~~~~~
˙ (œ)
63

Ó ∑
Ob. &8 4 4 8 4Π4
p
3

µ œ  œ œ œ bœ œ Ÿ~~~~~~~~~~~~~
‰ œœ œ
63

& ∑ Œ œ bœ œ mœ ∑ Œ b ˙ ( b œ)
Cl.
œ Bœ ˙
ƒ p f
P
7
3

˙ ˙
?
63

Bsn. ∑ ∑ ∑ ∑
f p
? ˙ ˙
63

Hn. ∑ ∑ & ∑ Œ Bb˙


3 2 4 f p 7 3 f 2
œ. œ œœ Bb œ . œ n˘œ ˚ œ̆

63

mœ . b˙
&8 j œ bœ œ œ 4 œ.
5

4 . 4 œ œ 8 4
>œ œ œ
Tpt.

F 3 ƒ 5

? ⋲ µœ . j
63

mœ . œ Ó
3
Tbn.
J mœ µ˙ ˙ œ. ˙ œ
f P f Chinese cym.

>œ j >œ >œ >œ ˙~~~~~~~~~~~~~~~


Chinese cym.
>
with yarn

Ͼ Ͼ
63 medium bongos congas
stick end of mallet

÷ ∑ ‰ Ó ‰ ‰ Ó Œ
æ
œ
yarn
Perc
æ œ
J
p
Í f p F
3
3

œ /. 72
œœœ ...
3 bc11 2 j 4 7 œ 3 2
63 c12 / 76 c13 / 80 c14 / 84 c15 / 88

& Bµ œœ .. m Bœœœ µ µ Bœœœ


15

œœ m œ
3

Bµ œœ .. bœ m µb bœœœœ n n µ œœœ
µ bœ
mœ n œœ µ œœ 4 Bµ œœœœ œœ bmbµ œœœœœ
&8 4  œœ 8 4Œ 4
Synth
mœ j mœ œ
? œ œt
œ. œ
Bœ œ œ mœ mœ
63

∑ ∑ ∑ ∑ ∑
3

Comp. &

& ∑ ∑ ∑ ∑ ∑
15

& ∑ ∑ ∑ ∑ ∑
f
Kbd.

? t œ
3
j
œ. œ œ mw œ. ˙ ˙.
œ. Ÿ~~~~~~~~~~~~~~
œ bœ mœ nœ µœ œ ˙ µœ . œ
Bœ ˙ m ˙ (Bœ )
63

Vn. & Ó Œ Œ
f ƒ 7 fÍ Í
3

3 2 4 3 2
Bœ . mœ bœ œ nœ œ œ ˙ ˙æ̇
3

Ó ∑
Vla. &8 4 4 8 4Π4
f ƒ Í
µ µOœ .. mœ nœ œ œ œ µœ ˙
? Ó ∑ Œ
µ˙
Vc.

f f Í
3

j ˙ ˙
?
arco
bœ ∑ ∑
3

Cb.
µœ . œ bœ
p F f p

3/16/08
A. Tear of the Clouds score 124

16 Tear of the Clouds

Ÿ~~~~~~~~~ >œ . ˙
K œ mœ œ mœ
m œ (œ) >œ >œ
> > > >œ
q = 76
r B>œ ⋲ mœ œ µœ µœ . µœ . mœ œ
& ⋲ . µœ
68

Fl. ∑
2 p ƒ 4 3ƒ 4 f5
5
9
~~~~~~~~~~~~~~~~~~~~~~~ m˘œ œ µœ ˙
&4˙ . œ Âœ µ œ
68

Ob.
4 R‰ Œ Œ ⋲ mœ
4 Œ
4 ∑
8
ƒ ƒ p
œ µœ
7

b œ bœ Bœ Bœ ˙.
6:5

œœ Bœ m œ
& ⋲œ
68

µœ
3
œ ∑
Âœ œ mœ
Cl.

œœ œ œ. œ mœ f
>
6


6

œ œ. m œm œ œ
6

? œ œ œ mœ
68 5

∑ mœ ˙. Œ Ó
mœ œ œ bœ nœ œ œ
Bsn.

f 6

j 3 œ B>œ ˙.
68

& j j œ œ œ Œ Bb˙ œ Bœ
3
3

2œ œ
Hn.

œ 4 œ œ ƒ 3 Ï 4 p 9
>œ œ ˙. œ œ >œ . ˙
& 4 œ. m>œ
68


Tpt.
4 4 4 8
F ƒ p Ï
? œ œ Bœ œ œ Bœ œ bœ
68

∑ J œ ˙ Œ µœ µœ . ˙
J
ø
Tbn.

P f f p
3 3

68 ~~~~~~~~~~~~~~~~~~~~~~~
˙ P
÷ ∑ ∑ Œ Ó
tam-tam

Perc
œ
f F
c19 / 104 c20 / 108

œ j 9
bœœœ µ œœ
68
2 4 c16œœ/ 92 c17 / 96 c18 / 100
3 4 c21 / 112 c22 / 116

& Bœ
µ m œœœ mœ
15


B n Bœœœ œ
3

Bbµ œœœ n œœ
Âm µ œŒœœ bB n œœœœ
B
3

Bœ ‰ B b µ œœœ ‰ b Ân œœœœœ b œœ Œ Œ ‰ m Âœœ


Synth
&4 4 œ 4 4 µœœ 8
µ œœ µœ œ
t ? ‰ œ ‰ Bn œœ mœ œ ˙. œ bœ
œ
œ 3
3
68

Comp. & ∑ ∑ ∑ ∑

∑ ∑ ∑ ∑
“big pluck”

&
15

& ∑ ∑ ∑ m œœ Œ Œ ‰ m œœ
Kbd. J
F
t Π? j
3
mœ œ ˙. ˙. œ bœ
j œ œ
3

œ œ œ œ
° *°
√ b>œ
Bb
œ.

C

~~~~~~~~~~~~~~~~~~~~~~~~ Ÿ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ q = 76
˙ b w (µ œ ) bœ . œ. œ
œ œ
68

Vn. & Ó œ bœ
2 ƒ 4 Í ƒ 3 F 4 f 5
9
>œ œ w≤
(with tpt.)

Bœ œ m œ ∑
Vla. &4
æ æ æ 4
Í ƒ
4 4 w
> π
8
ƒ f
œ̋ Ÿ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
µ w (œ)
? œ mœ B œ œ.
pizz.

‰ œ ‰ Ó
3

µ ˙æ
Vc. Bœ
ƒ 3 Ï F f π

?
pizz.
œ ˝j
∑ ‰ ‰ ‰ ∑ ∑
3

Cb. œ bœ œ
f 3
Ï

3/16/08
A. Tear of the Clouds score 125

Tear of the Clouds 17

>œ b œ >œ .
m>œ n œ œ
 œ œ µœ œ . m œ
C
bœ µœ œ ˙.
‰ œr œ µœ µœ œ . œ. œ bœœ œ µœ j œ.œ
72

Œ ‰
7

œ œ
3

&
>œ œ
Fl.
5

P p
5
9 4 ƒ F
7 3
5

72

&8 ∑ ∑ ∑ ∑
Ob.
4
72

Cl. & ∑ ∑ ∑ ∑

?
72

Bsn. ∑ ∑ ∑ ∑

bœ œ . j
72

Hn. & ˙ œ œ Œ Bb ˙ ˙. œ. ⋲ ∑
9 4 π p ƒ
Bb>œ Bb œ bœ
5

mœ ˙ mœ bœ mœ
72

&8 ∑ Ó Œ ‰ Œ
Tpt.
4 œ m˙ .
f
µœ ˙ fÍ
? ˙ œ œ œ. œ ˙.
72

Tbn. Œ Œ Bœ œ Œ
π f P f π
œ #œ œ
7

œœ
bœ nœ œ Œ
medium vibraphone

œbœ bœ ‰ J
72

÷ ∑ Œ Ó ∑ œ bœ Œ
mallets (motor off)

&

Perc

f 7
F
° *° *°
7215
9 4
&
/ 120 / 124 / 128

j
µ œœ Bœ
b m µ œœœ bœœ ‰ m m œœœœ
Synth
&8 4 Œ µœ
Ó
µ  µ œœœ
Œ Œ
œ
? ˙ œ Bµµ œœœ m œœ ‰ µ œœ
j
œ. œ. Bœ
mœ Bœ
72

Comp. & ∑ ∑ ∑ ∑

& ∑ ∑ ∑ ∑
15

& ∑ ∑ ∑ ∑
Kbd.

b˙ .
? ˙ œ t ˙. ∑ ? œ
œ. œ. mœ n˙ . ˙. œ.
*° *° *°
œ
œ œµ œ . œœ
⋲ >œ
C
œ œ œ µ œ µ œ µ œ Bœ m˙ . œœ œ œ
R œ mœ bœ
72 6

& bœ . Bœ µ œ Œ œ
Bœ ˙ ˙.
Vn.

f 4 f
5

p p
7 5
9 5

œ œ œ µœ >O O Ȯ bœ œ
&8 Œ ‰ J J Bœ m œ b œ B Bœ µ œ Bœ œ œ œ Œ ⋲ µœ bœ
œ
4 Â˙ . œ œ
Vla.

f 3
p f 6

Bœ œ œ. œ ? µ œ µ œ Bœ œ
7:5

B œ
5

Œ ‰ J µœ œ Bœ ∑
œ œ Âœ œ j Bœ mœ µœ Âœ µœ µœ mœ Bœ œ nœ
3

œ mœ
Vc.

f 5
ƒ
j Bœ˝
pizz.
? ∑ Ó ‰ Âœ œ œ j ⋲ Œ Ó
Cb.
µ˙ œ.
f ƒ

3/16/08
A. Tear of the Clouds score 126

18 Tear of the Clouds

œ Bœ nœ œ œ mœ œ œ. œ œ bœ µœ œ . œ
œ bœ œ
œ
µœ nœ œ nœ œ Âœ bœ œ µœ m œ bœ œ mœ
J Bœ n œ œ œ bœ
76

& Œ ‰ œœ
mœ œ Bœ .
>
Fl.

f 5p f
6
5 6 3 3 4
œ b œ µ œ m œ µœ .
76

Ob. & Ó Œ œ 4 œ ˙ 4 œ Âœ . œ Âœ . œ œ 4
p f
5
5

œ œ bœ Bœ Bœ
œ œ µ˙ œ bœ œ œ œb œ J
76 5

& ‰ mœ œ bœ
5

mœ œ . ˙ œ mœ œ Âœ mœ œ œ . œm œ
Cl.

>
f
7

?
76

Bsn. ∑ ∑ ∑

76

Hn. & ∑ ∑ ∑
3 5 4
76

& ∑ ∑ ∑
Tpt.
4 4 4

?
76

Tbn. ∑ ∑ ∑

j œ #œ œ nœ
#œ j œ #œ œ œ J œ œ
œ n œ nœ
76 3

& Ó bœ ‰ bœ ‰
5

œ œ.
œ œœb œ œ
œ #œ bœœ .. œ bœ œ .

Perc
œ
f p f 5

*° pedal phrases…

7615
3 mœœ / 144
5 4
& œœ j
/ 132 / 136 / 140

µœ µ µB œœœ
3

œ
& Ó m µ m œœœœœ Bb œ
BB œœœœ œ
b bn œœœ
µ œ Ó
4 B œœ 4 4

Synth

? Ó Bœ m n œœ Âœ
œ J
3

76

Comp. & ∑ ∑ ∑

& ∑ œ mœ ∑
15

b œ bœ
& ∑ œ Ó ∑
Kbd.
œœ

œ f
7

? ˙ œ mœ J ∑ ∑
3

* °
µ˙ .
⋲ mœ  œ œ œ œ œ j mœ œ mœ bœ œ œ µ œ n œ µœ µ œ
76 7 3 5:3

& j œ œ œ œ œ
3 3
Vn.
œ µœ œ 3
5:3
œ
3 5 4
œ œ Âœ . bœ
B œ Bœ 4 œ & œ œ µœ
Vla.
m˙ mœ 4 œ µœ µœ œ 4
5:6

? ˙ Œ ‰. r
5

œ œ. µœ œ µœ µœ œ
b˙ œ bœ œ œ R
œ
Vc.

f F
3
5

Cb.
? ∑ ∑ ∑

3/16/08
A. Tear of the Clouds score 127

Tear of the Clouds 19

œ œœ œ
µœ mœ µœ œ mœ µœ mœ œ
µœ . œ œ mœ
œb œ bœ . µ œ œ bœ µœ µœ µœ µœ
bœ œ bœ b œ b œ b œ œ œ m œ
œ . œ bœ mœ
mœ œ mœ nœ œ nœ bœ
œ mœ œ ‰.
79

Fl. & 5
œ 7:5
5
4 5
7:5 3 6 6 6

œ mœ œ bœ Âœ mœ mœ œ œ mœ bœ œ
7:6

œ œ µ œ m œ µ œ bœ mœ n œ m œJ
œ. mœ œ œ mœ mœ œ n œ mœ œ mœ
3

j
R ⋲ œ œ Bœj . œ œm œ œ mœ nœ
79

Ob. &4 ⋲ œ mœ .
5

mœ œ œ bœ mœ œ nœ mœ nœ œ. bœ Bœ Bœ mœ œ œ bœ œ mœ . œ œ mœ nœ ⋲ œ bœ . m œ . n œ .
œ œ œ œ bœ mœ œ bœ œ œ œ œ .mœ .œ .œ .œ .
79

Cl. & mœ 6:5 6:5


7 7

r m œ bœ .
6:5

mœ . nœ œ œ œ
? B bœ œ nœ œ œ œ bœ œ œ j
79

∑ Œ œ. œ bœ bœ œ mœ œ.
mœ œ
Bsn.

f 7:6
7:6

79

Hn. & ∑ ∑ ∑
4
79

Tpt. &4 ∑ ∑ ∑

?
79

Tbn. ∑ ∑ ∑

œ #œ œ bœ bœ œ n œ œ œ #œ œ œ #œ
œ bœ œ nœ œ bœ b œ œ œ b œ n œ n œ
œ œ
nœ #œ bœ œ # œ bœ
79
œ ⋲ œ œ œ
Perc & bœ 5
œ bœ œ 5
nœ #œ 5

79
4
&
15

&4
Synth

œ
?

79

Comp. & ∑ ∑ ∑

& ∑ ∑ ∑
15

& ∑ ∑ ∑
Kbd.

? ∑ ∑ ∑
*
bœ œ œ. .
œ œ œ mœ œ œ œ œ nœ µœ
œ
J
œ mœ œ œ œ œ m œ œ œ œ œ nœ µœ œ bœ œ œ
œ œœ r
79

& œ mœ œ œ œ œ
5

Vn. 6:5 œ œ. œ mœ œ 5
5

4 5 5
5

‰ bœ bœ œ œ µœ œ œ œ bœ œ mœ œ nœ nœ œ
&4 µœ œ J R ⋲ ‰ j j œ Âœ œ
œ J µœ œ œ œ . œ nœ
5:3

R
œ µœ œ . œ µœ
Vla.
j r
4œ :5œ
3
j r
5:3
4 œ :5 œ
5:3 5:3 5

œ nœ œ B mœ œ . bœ œ bœ nœ œ œ mœ nœ
bœ nœ µœ œ œ mœ n œ
5:3 9:8

? œ µœ µœ
5:3

œ œ µœ µœ œ œ œ mœ œ œ µœ œ œ
Vc.
3
œ œ µœ œ 9:8
œ .
9:8

Cb.
? ∑ ∑ ∑

3/16/08
A. Tear of the Clouds score 128

20 Tear of the Clouds

œ œ mœ œ m œ ⋲ œ œ bœ nœ b œ n œ bœ . œ. œ œ mœ œ m œ œ œ b œ n œ bœ œ bœ . œ œœ
œ mœ œ mœ bœ nœ bœ . œ.
mœ œ mœ nœ œ nœ mœ œ mœ nœ œ nœ œ mœ . œ œ
œ
82

Fl. & 6
mœ 6
6 3
7 7

6 6
3 5:3 5:3
4 7 7

œ œ œ mœ œ mœ œ œ œ. œ mœ œ œ œ. œ œ œ
82
œ œ mœ mœ mœ mœ nœ mœ œ œ mœ mœ mœ nœ mœ œ œ mœ mœ nœ œ mœ mœ œ œ
Ob. & 4 5 5 4 5 5
5 5 5

œ. œœ œ œ . nœ œ.
6:5 6:5

œ œ nœ bœ bœ œ bœ
82
œbœ œ œ œ œ b œ m œ nœ
6

œ œbœ bœ œ œ œ œœ
bœ b œ œ nœ œ œ ⋲
œ œ œ
bœ b œ
Cl. & 7:5
œ mœ œ . œ mœ œ œ 6 œ
6 6
7:5 3 6
6 6

bœ œ bœ nœ œ œJ bœ b œ n œ œJ bœ
6

œ.
B œ b œ œ mœ J œ b œ œ b œ bœ œ œ n œ bœ n œ
6:5

bœ bœ œ mœ œ nœ mœ œ œ b œ œ b œ bœ R mœ œ œ œ
82

Bsn. œ
7:6 6:5
6:5 6:5 5
6:5 5

82

Hn. & ∑ ∑ ∑
3 4
82

& ∑ ∑ ∑
Tpt.
4 4

?
82

Tbn. ∑ ∑ ∑

œ.
bœ nœ œ œ bœ bœ n œ œ bœ n œ œ bœ œ
bœ nœ œ bœ bœ n œ œ bœ œ œ
œœ
#œ #œ . bœ #œ #œ #œ
82

Perc & R bœ œ œ œ bœ œ œ bœ œ œ bœ
6 5 5
5
6:5 6:5 6:5 5 5

82 / 148
3 4
&
15

& 4 4
Synth

? Bœ

82

Comp. & ∑ ∑ ∑

& ∑ ∑ ∑
15

& ∑ ∑ ∑
Kbd.

? Bœ Œ Ó ∑ ∑

mœ nœ œ mœ nœ µœ mœ µœ œ mœ œ nœ bœ µœ µœ œ mœ
bœ œ µ œ Âœ . n œ œ
bœ œ œ n œ m œ œ œ . bœ œ µœ
œ mœ µœ nœ bœ œ œ
œ µœ œ µœ œœ J œ œ
82

œ µ œ µ
Vn. & œ 5
6 6
5
5 5
3 7:5
7:5 4 6 6

bœ œ m œ œ mœ œ mœ œ mœ nœ œ mœ . mœ nœ œ œ µœ bœ nœ mœ œ
& œ µœ œ œ
nœ œ µœ œ µœ œ œ bœ µœ œ µœ œ œ bœ œ µœ R
Vla.
4 µœ 4 7:5
7:5 7:5 7:5
5

œ. bœ mœ n œ œ bœ mœ nœ . œ . bœ œ nœ . bœ œ œ n œ m œ nœ
B mœ œ bœ nœ œ nœ
œ µœ R J œ bœ n œ œ œ
œ µœ bœ nœ œ œ µœ mœ bœ n œ
Vc.
6:5 Âœ œ µ œ œ œ 5 5
5
5 5 5 5
6:5 5
6:5

Cb.
? ∑ ∑ ∑

3/16/08
A. Tear of the Clouds score 129

Tear of the Clouds 21

œ œ mœ œ mœ bœ nœ bœ œ œ. œ œ n œ œ m œ œ b œ n œ bœ . œ. œ œ mœ œ mœ bœ nœ bœ œ. œ œ mœ œ mœ nœ bœ œ
œ œ œ œ
85

Fl. & 7 7

3
œ œ œ. œ œ. œ œ.
85
mœ mœ œ mœ mœ œ œ mœ mœ mœ mœ œ œ mœ mœ mœ œ œ mœ mœ œ . mœ œ œ m œ mœ œ mœ
Ob. & 5 4 7 7
5 6 7 7

œ œ. œ œ œ bœ œ œ. nœ bœ bœ œ œ œ œ nœ bœ œ œ nœ bœ
œ œ œ bœ b œ nœ œ
œ œ œ œ œ œ œ
œ œ
85

Cl. & 6 5:3


6 3 5:3 5:3
7 7

œ bœ nœ bœ nœ bœ bœ bœ
B bœ œ bœ bœ œ œ b œ n œ bœ b œ œ b œ n œ bœ bœ œ bœ œ bœ nœ bœ bœ œ bœ
85

œ bœ bœ nœ bœ bœ
3

Bsn.

5 7:5 7:5 7:5 7:5

-̇ œ- œ- -̇
85

Hn. & ∑
3
F
3 3

85

& ∑ ∑
Tpt.
4

?
85

Tbn. ∑ ∑

85
bœ œ œ
#œ bœ œ #œ œ bœ œ bœ
œ
R #œ œ bœ
œ
#œ bœ #>œ
Perc & œ œ bœ œ œ œ bœ œ bœ œ œ bœ
5
3
5 5 5 7:5 6

85
3
&
15

& 4
Synth

85

Comp. & ∑ ∑

& ∑ ∑
15

& ∑ ∑
Kbd.

? ∑ ∑

œ. nœ µœ nœ œ mœ nœ µœ œ mœ nœ µœ mœ nœ bœ µœ œ bœ µœ µœ
bœ œ bœ œ bœ œ œ µœ œ œ µœ œ œ µœ bœ œ œ µœ
œ œ
85

œ µ œ µ
Vn. &
7
6 6 7
3 7 7

mœ œ œ µœ mœ œ mœ œ œ µœ œ œ œ µœ mœ œ œ œ µœ œ œ µ œ œ µ œ œ œ µ œ œ µ œ œ µ œ œ µ œ œ
Vla. & µœ œ µœ œ µœ œ 4 µœ
3
7:5 7:5 6 6 6 6

œ bœ œ nœ bœ œ nœ bœ œ bœ œ bœ œ
B bœ nœ œ œ bœ nœ œ œ b œ n œ bœ œ œ bœ bœ œ n œ b œ b œ œ œ n œ bœ b œ œ œ n œ
Vc.
3 6 6 6 6 3 5:3 5:3

Cb.
? ∑ ∑

3/16/08
A. Tear of the Clouds score 130

22 Tear of the Clouds

œ œ bœ œ œ œ œ
87 œ mœ œ mœ œ mœ œ mœ œ œ mœ œ mœ œ œ mœ œ mœ œ œ œ mœ œ œ œ mœ œ œ mœ œ œ mœ œ mœ œ mœ œ mœ œ mœ ®
mœ mœ mœ mœ mœ mœ mœ mœ
Fl. &
cresc… 4
œ
mœ mœ mœ œ œ mœ mœ mœ œ œ mœ mœ mœ œ œ mœ mœ œ œ mœ
m œ œ m œ m œ œ m œ m œ m œ mœ m œ m œ mœ m œ m œ m œ m œ m œ m œ m œ m œ
& œ œ œ œ œ œ œ œ
87

Ob.
4 7
7 7 7 7 7 7
cresc…

œ œ œ œ œ bœ n œ œ œ œ b œ œ œ œ bœ œ œ œ bœ bœ bœ bœ bœ bœ bœ bœ bœ bœ
œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ ⋲
87

Cl. &
cresc…
7 7 7 7

B œ bœ œ bœ œ bœ œ b œ œ œ bœ b œ œ n œ b œ bœ œ n œ b œ b œ œ b œ bœ œ b œ bœ œ bœ b œ œ b œ b œ œ b œ b œ
nœ bœ bœ bœ nœ bœ
87

Bsn. œ bœ bœ nœ bœ bœ
cresc… 6 6 7

- œ- œ- œ- œ- -̇ œ-
87

Hn. & œ
F cresc…
4 5
f
87

& ∑ ∑
Tpt.
4

?
87

Tbn. ∑ ∑

b œ #>œ bœ > bœ > œ bœ b>œ œ b œ b>œ œ b œ b>œ œ b œ > œ b œ > œ b œ >œ bœ œ bœ œ bœ


bœ œ œ # œ bœ œ œ # œ
87

Perc & œ œ bœ œ œ œ œ œ œ
3 8:5 8:5 8:5 8:5 8:5
cresc…
87
4
&
15

& 4
Synth

87

Comp. & ∑ ∑

& ∑ ∑
15

& ∑ ∑
Kbd.

? ∑ ∑

µœ µœ µœ mœ µœ mœ µœ mœ µœ mœ µœ mœ µœ mœ µœ mœ µœ mœ µœ mœ µœ mœ µœ mœ µœ mœ µœ mœ µ œ mœ
bœ œ
œ µ œ œ œ µœ œ µ œ œ µœ œ µœ œ µœ œ µœ µœ µœ µœ µœ µœ µœ µœ µœ µœ
87

Vn. &
4
œ œ œ œ œ œ œ œ œ œ œ
cresc…

µœ œ µœ µœ ⋲ µœ µœ µœ µœ µœ µœ µœ µœ µœ µœ µœ µœ
Vla. & µœ œ µœ œ µœ œ µœ µœ µœ 4 µœ µœ µœ µœ µœ µœ µœ µœ µœ µœ ®
6
cresc… 7 œ œ œ œ œ

B b œ b œ œ n œ b œ b œ œ n œ b œ bœ œ b œ b œ œ b œ b œ œ bœ œ œ b œ œ œ œ
œ
œ
œ
œ
œ
œ
œ
œ
œ
œ
œ
œ
œ
œ
œ œ
œ ?
bœ bœ bœ bœ bœ bœ bœ bœ bœ
æ
Vc.
7
cresc…

Cb.
? ∑ ∑

3/16/08
A. Tear of the Clouds score 131

Tear of the Clouds 23

q = 96 ˘ ˘ ˘ ˘ ˘ m˙ .
w mw ˙. m˙ . œ̆ œ̆ ˘ œ̆ m œ œ̆ œ̆ ˘ œ̆ m œ œ̆ œ̆ ˘ œ̆ m œ œ̆ œ̆ ˘ œ̆ m œ œ̆ œ̆ ˘ œ̆ m œ
œ œ œ œ œ
89

Fl. &
ƒ f 3 Ï
5 5 5 5 5
3 5 5
œ̊ ˙.
89
mw mw m˙ . m˙ . m˚œ n˚œ µ˚œ m˚œ
Ob. & 4 4 4 4
ƒ f Ï
bw bw b˙ . b˙ . ˘ ˘ ˘ ˘ ˘ ˘ ˘ ˘ ˘ m˙ .
89
B˘œ œ̆ œ̆ œ̆ m œ Bœ œ̆ œ̆ œ̆ m œ B œ œ̆ œ̆ œ̆ m œ Bœ œ̆ œ̆ œ̆ m œ Bœ œ̆ œ̆ œ̆ m œ
Cl. &
ƒ f 5 5 5 5 5 Ï

B w œ bœ œ . n œ œ m œ œ b œ. œ µ œ m œ œ bœ. œ m œ œ b œ. œ m œ œ b œ. œ. œ.
> . . > . . > . .
œ > . .
œ > ˙.
89

Bsn. Ó mœ
ƒ Ï
f
5 5 5 5
5

˚ Bb ˙ .
& ‰ Bb œJ ˙ . ˙.
89

bœ œ bœ mœ
  œ
5 
Hn.

fÍ 3 ƒ  3 Ï 5
f
Bb ˙ .
& w ˙. mœ mœ nœ œ mœ mœ nœ œ mœ mœ nœ œ mœ mœ n œ œ m œ mœ nœ œ
89

Tpt.
4 4 œ œ œ œ œ 4 4
fÍ ƒ f Ï
w ˙. B˙ .
5 5 5 5 5

?
89
B˚œ œ̊ n˚œ b˚œ µ˚œ
Tbn.

fÍ ƒ f Ï
b wæ ~~~~~~~~~~~~~~~~~~~~~~~
sus. cym.

˙.
89

& b www ∑ ∑ ÷
æ
Perc

p
ƒ
89
3 5 3 5
&
15 / 152 / 156 / 160 / 164

µœ
Ó µ µ µ mœœœœœ µÂ œœœ n µ m µBœœœœœœ
& 4 4 œ œ œœ 4 4
Synth

? œ
noise wave

µœ œ µœ µœ

89

Comp. & ∑ ∑ ∑ ∑

∑ ∑ ∑ ∑
“big pluck”

&
15

& ∑ ∑ ∑ ∑
f
Kbd.

? œ >˙ .
∑ ∑ mœ œ mœ mœ

q = 96
. ˙.
µ m ww µ m ˙˙ . µ˘œ b˘œ µ˘œ œ̆ µ˘œ b˘œ µ˘œ œ̆ µ˘œ b˘œ µ˘œ œ̆ µ˘œ b˘œ µ˘œ œ̆ µ˘œ b˘œ µ˘œ œ̆ ˙.
æ æ œ̆ œ̆ œ̆ œ̆ œ̆ æ
89

Vn. &
ƒ 3 5 f 5 5 5 5 5
3 Ï 5
w ˙. œ̊
µw µ˙ . œ œ̊ œ̊ B ˙æ.
Vla. & æ 4 æ B
4 œ̊ Œ & 4 ˙. 4
ƒ f Ï

?
Bw
æ B˙æ. & Œ bœ œ œ ˙æ.
œ m œœ  b˙ .
Vc.

f  mœ Ï
j j j
 j
 j
œ œ œ œ œ m˙ .
?
æ̇. bœ- b œ- b œ- b œ- b œ-
Cb.
w
f ƒ f Ï

3/16/08
A. Tear of the Clouds score 132

24 Tear of the Clouds

w
take Piccolo
93

Fl. & Œ ∑ ∑ ∑ ∑
5 2 4 3 5 4
˙. œ µœ œ mœ >œ ˙. œ >œ . œ. œ
œ œ mœ ⋲ mœ œ mœ
93

Ob. &4 4 >œ 4 4 ˙ œ 8 4


f Ï ƒ
˙ œ œ Bb œ œ œ m>œ ˙. œ m>œ . œ. œ
6 3


œ bœ œ ⋲ œœ
93

& œ bœ œ
b˙ œ
Cl.

f >
Ï >œ ƒ
3

˙. œ
6

˙. œ œ œ ⋲ œ œ œ. œ. œ
B ? œ œ mœ
93

œ mœ ˙ œ
> >
Bsn.

f 5
Ï
5 ƒ
> > ˙. >
w œ ‰ ? œ & Bb œ mœ œ œ ⋲ ? œ & Bbœ . œ. œ ?
93

& J ˙
fl fl fl
Hn.

5 2 ƒ 4 3Ï ƒ 5 4
˙. œ œ Bb>œ >œ >œ >œ >œ œ Bb>œ ˙. œ
R ⋲ bœ . .
93

œ̆ œ̆
Tpt. &4 4 bœ 4 4 b˙ œ > 8œ œ
4
> f ƒ
3 3

Ï
w œ >œ ˙. œ >œ . œ. œ
? J
93


œ œ
>
Tbn.

ƒ Ï̇ ƒ
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >œ ‰ >œ >œ >œ >œ æ̇ œ.
hi bongo
93 Chinese cym.

÷ ‰ Ó Œ ⋲ ∑
low conga
w œ
J œ J
> >
Perc

f Ï f f F
6

93
5 2 4 3 5 4
&
15 / 168

&4 4 4 4 8 4
Synth

? œ ˙. Œ œ. œ. œ
bœ b˙ œ
93

Comp. & ∑ ∑ ∑ ∑ ∑

∑ ∑ ∑ ∑ ∑
“deep synth”

&
15

& ∑ ∑ ∑ ∑ ∑
Kbd.

? w Œ œ ˙. Œ œ. œ. œ
bœ b˙ œ
˙. œ. œ œ bœ œ
˙. œ. >œ ˙ œ œ œ mœ >œ . œ. œ
bœ B œ œ m˚œ m˚˙
æ æ Bœ œ œ
93

Vn. &
f 2Ï f ƒ
6
5 4 3 5 4
3

B>œ ˙ œ œ B>œ . œ. œ
˙æ̇. œ . B µœ œ œ œ œ œ⋲ œ œ œ
5

&4 . œ 4 œ œ ˙ œ œ. œ. œ
Vla.
œ. 4 4 œ 8 4
f  ̇
Ï f
5
ƒ

& æw bœ nœ ˙ œ œ ⋲ œ µœ œ
5

b˙ œ nœ . œ. œ
µœ œ
5

w œ nœ œ  b>œ ˙ œ œ  b>œ . œ. œ
Vc.

f Ï f ƒ
? w µœ œ œ œ m>œ ˙. œ m>œ . œ. œ
bœ œ œ œ b˙ œ
 
Cb.

f Ï f ƒ

3/16/08
A. Tear of the Clouds score 133

Tear of the Clouds 25

98

Fl. & ∑ ∑ ∑ ∑
4 5 9 4
œ̊5 j œ. œ œ Bœ
œ œ œ mœ J
98
œ œ
Ob. &4
>
œw œ ˙. 8 8 mœ- m œ. >œ >œ œ µ œ 4
F ƒ Ï P F ƒ
m˚œ œj
6

mœ . œ œ œ Bœ
˙. bœ œ m œ J
98

& œ bœ
> œ bw œ- m œ. b>œ n>œ œ µ œ
Cl.

F ƒ Ï P F ƒ
œ œ mœ mœ
6

œ j œ. œ >
>œ œ œ m œ m œ œ
? . J
98

œ mœ œ mœ m
> w  œ ˙. œ-
Bsn.

F ƒ Ï P F ƒ
6

? æw Bb˚œ ?œ ˙. µœ . œ
98
flz.

& & j
flz.
Hn.

4 f F ƒ
æ
5Ï æ 9 mœp-
bœ .
œ. >
œ œ œ. œ 4
F

æ . œ. œ œ œ. œ
98


8 bæœ œ
flz.

&4 J
flz.

bw  bœ ˙. æ 8 mœ- œ. > 4
Tpt.

f F ƒ ƒ p F
œ̊ œ. œ œ. œ
? æw æ æ bœ. µ>œ . œ œ J
98 flz.
flz.
Tbn.
œ ˙. œ-
f F ƒ Ï p F

æ̇. x. j
temple blocks

Œ.
98

÷ Œ œ Œ x
œ- ‰ Ó
low bongo
Perc
fl
œ ˙ æ æ
f ƒ f f (senza dim.) p
98
4 5 9 c1 / 12 4
&
15

&4 8 8 4
Synth

? œ œ. œ Œ.
bw bœ ˙. ˙.
98
˙.
Comp. & ∑ ∑ ∑ ∑

& ∑ ∑ ∑ ∑
15

& ∑ ∑ ∑ ∑
Kbd.

? œ œ. œ Œ.
bw bœ ˙. ˙.
˙.
œ̊ œj œ. œ œ µœ œ
œ- œ µ œ œ m œ œ
bœ œ
mw mœ ˙. æ æ œ m œ- J
98

Vn. & µ œ-
9P F ƒ
6
4 5 4
B˚œ Bœ . œ Âœ
œ œj œ. œ µ œ Âœ œ œ
B mœ œ & µ œ µ œ- µ œ- œ œ m œ
Vla.
4 w œ ˙. 8 æ æ 8 b œ-
P 6 ƒ
J 4

œæ. œæ œ œ mœ œ j B
6

& µœ œ bw œ j bœ ˙. bœ µ œ- µ œ- œ  œ µ œ nœ
b œ µœ bœ . œ œ-
Vc.

 P F ƒ

? µ˚œ j µœ œ µ˙
Cb. œ œ bw œ bœ ˙. J Œ Œ.
p f p

3/16/08
A. Tear of the Clouds score 134

26 Tear of the Clouds

q = 108

œ œ mœ œ mœ œ œ r
6

j
Piccolo

œ ‰. mœ œ œ mœ œ
102

Œ ‰ œ œ œœ ‰ Œ ∑
6

Picc. & œ mœ
4 p f p 2 4
˚
6

œ Bœ œ œ œ œ œ œ ˚ ˚ œ̊ m œ ⋲
œ œ œ œ >œ œ Bœ mœ µœ œ
œ œ œ œ̊ b œ µ œ
102

&4 œ œ mœ œ œ B Œ ‰ ‰ œ œ
4‰
3

> 4
Ob.

f f ƒ
6

ƒ
6

P
6 3
5

Bœ œ > œ ˚ ˚ Â˚œ µ˚œ œ̊


5 6

œ œ œ n œ µ œ œJ
6 6
n
6

œ œ B
Bœ b œ b œ œ ‰ ‰ œ œ œ µ œ œJ
102

& bœ œ œ nœ œ ‰ bœ ⋲
œ mœ
3

Bœ b
œ mœ œ œ bœ œ
Cl.

>
P f ƒ f
3

ƒ
bœ œ œ œ œ̊ m˚œ ⋲
? m œ m>œ œ œ bœ œ bœ nœ nœ nœ œ̊ m˚œ œ̊
102

bœ nœ mœ Œ Ó ‰
œ
Bsn.

P f ƒ
5

ƒ
5
6

j j
102

& j ‰ ‰ Bœ œ œ Bœ ‰ Œ Ó ∑
4 œp fl
Hn.

p F S 2 4
102
œ j ˘ ˚ ˚ m˚œ œ̊ Bb˚œ ⋲
‰ ‰ bœ ‰ Œ Ó
Tpt. &4 J bœ œ œ
J 4‰ mœ bœ 4
p p F S ƒ
œ
? J œ̆
102

‰ µ˙ . ‰ Œ Ó ∑
Tbn.
J
p p S
hi bongo
œ̆
102

Perc ÷ ∑ J ‰ Œ Ó ∑
S
102
4 2 4
&
15 c3 / 20 / 28

j b œ.
B œœ m n œœ B œœ ..
3


3

&4 Ó
µ˙
m Bµ ˙˙˙˙˙ Bµb bœœœœ ‰ µ µ m n œœœœ Œ BBµ µ œœœœœœ Œ ‰ µ œœœ ...
Synth n œ 4  œ. 4
bœ mœ
? Ó m˙
Âœœ ‰ œ Œ µ œœœ Œ ‰ Âm œœ ..
3 J
3
102

Comp. & ∑ ∑ ∑


“big pluck”

&
15

j bœœœ
œ m m œœœœ
3


3

& ∑ bœœœ ‰ m œœ Œ n œœœœ


m Œ ‰ J Œ
b œœ
b m œ
J
Kbd.

F œœ
? ∑ ∑ ‰ mœ Œ
J

œ Bb œ œ µœ µœ q = 108
´ µ œ´
mœ œ µœ œ Âœ Bœ œ µ œ >œ Âœ Bœ µ œ œ œ ˝ ´ œ́ bœ
œ œ µœ ‰
æ
pizz. arco

Âœ œ́
102

‰ ‰ r Œ ‰ ⋲
æ æ æ
& µœ œ
3
Vn.
œ œ
f
6

4 F
2 ƒ 4
6
5

œ@ œ@ @ @
˝j
6

Bœ m œ ˝ bœ œ
œ mœ Bœ ‰ ‰
6

‰ œr œ œ
pizz.

µœ œ
pizz. arco

B mœ œ µœ ‰
4‰ œ Œ
æ æ æ
&4 µœ
3

µœ œ mœ œ
3

mœ œ 4
Vla.

f 3 5
> 3

µ œ bœ Bœ œ mœ µœ Bœ mœ œ œ ˝j
B œ mæœ æ œ µœ mœ œ
@ @ @œ œ@ œ œ bœ œ ? œ œ
pizz.

Vc.
Âœ ‰ œ ‰ Âœ Œ
f
5 3
6 6 3

˙ œ œ ˝ ˝ ˝j
? µ œ Âœ œ ‰ bœr Œ
pizz.

Œ œœ ‰ ‰ ‰ µœ Œ
3 3

Cb.

f 5

3/16/08
A. Tear of the Clouds score 135

Tear of the Clouds 27

mœ µ œ bœ œJ œ œ mœ œ
œ œ œ mœ œ œ mœ œ mœ
⋲ œ m œ œ m œ ⋲ b œ n œ œ n œ ⋲ œ bœ œ œ ⋲ . œ œ œ œ bœ œ œ m œ n œ œ bœ n œ œ n œ m œ
105

Picc. & ‰ ∑
f ƒ 4
6
4 6 7 2 5 5

bœ n œ n œ m œ
5 7

µœ œ mœ œ
⋲ mœ mœ œ mœ ⋲ œ mœ œ

⋲ œ bœ œ J µœ œ m œ œ œ
mœ µœ n œ b œ œ œ m œ
œ µœ œ b œ b œ n œ
µœ œ mœ Âœ Âœ m œ œ m œ
105


Ob. &4 4 Π4 4
Ï
7

f f
5 6
6 3 5
7

Bœ œ B œ œR µœ œ bœ nœ œ µœ œ
⋲ bœ bœ ⋲ nœ bœ œ œ œ µœ mœ nœ
µœ mœ R ‰. œ
105

Œ ⋲ j œ
5

µœ œ m
3

& œ µœ œ œ œ
µœ œ œ
m n
µœ
Cl.

œ Ï
7

f f
6
5 3

S
? œ mœ œ œ mœ
j >œ m œ œ b œ œ œ m
m ⋲ mœ œ n
105

⋲ œ Œ ‰ nœ Œ œ œ œ œ œ œ
 œ œ. > > >
Bsn.

S f ƒ Ï
7
5
5

^j ^
‰ œj
105

Hn. & Œ ⋲ mœ . Œ ∑ Bb w œ
4 S S 2 5 p ƒ 4
105
Bbœ^
Ó Œ ‰ J ∑
Tpt. &4 4 4 Œ . œ 4
S ṗ ƒ
^ µ œ^
? ⋲ µ œj .
105

Œ Œ ‰ J ∑
w œ
Tbn.

S S p f
œ
nœ œ #œ œ œ bœ #œ nœ œ bœ ⋲ ⋲ nœ œ œ
105

÷ ∑ ‰
vibraphone

j
5

& #œ œ
3

bœ œ #œ bœ #œ
Perc

p
j
/ 36 (senza pedale)

µ bœœœ µ œœœ
/ 34
4 2 5 4
⋲ µ Bœœœ...
105

& Bœœœ Ó
15 / 30 / 44

œ
B œ ..
mmœœœ ⋲ m b Bœœœ ... Œ

µµ œœœ n Âœœœ µ œœ
& 4 Bµ œœœ 4 mœœ
J µ œ Bbœ
Synth 4 4
? m œœ

105

Comp. & ∑ ∑ ∑

j
& œ ⋲ œœœ... Œ m œœœ œœ ∑ ∑
15

m œm œœ œ.
⋲ b œœ .. Œ œ ∑ ∑
& œ mœ
Kbd. J mœ m œœ
ƒ
? ∑ ∑ ∑

µ œ µ œ µ œ Bœ œ µ œ
œ œ œœ
Âœ œ œ œ m œ œ œ œ n œ œ µ œ µ œ ⋲ Bœ œ œ m œ n œ m œ µ œ
⋲ œ œ œ œ
µ
µœ œ œ
m
mœ œ µœ
105

& Œ ⋲ µœ mœ
7


7

œ œ œ mœ œ
Vn.
7
6

ƒ
5 6
4 2 5 5 5 6
4
œ œ mœ œ œ
œ mœ œ œ µœ œ œ œ bœ
r

œ mœ œ ⋲
5

⋲ Âœ µ œ
arco 8:5 œ 6

B ⋲ œ Âœ nœ ‰ r œ œ & œ œ œ œ µ œ œ œ m œ
œ œ œ 4 µœ œ µœ
m µ m
mœ mœ œ
Vla.
4 4 7 4
5
5 ƒ
œ œ > mœ œ µœ
œ
? µ œ Âœ  œ œ œ
Âœ œ
œ œ mœ

œ œ bœ mœ œ mœ œ
arco

⋲ µœ ⋲ J ‰ œ& œ mœ µœ œ
5

œ . .
6

mœ mœ œ œ œ œ n œ œ
œ
Vc.
5 b 5 6

ƒ
5 7 5

.
? ⋲ µ œJ œ
pizz. arco

Cb. Œ Œ mœ Ó ww œœ
ƒ

3/16/08
A. Tear of the Clouds score 136

28 Tear of the Clouds

>
m˙ µ w˚ b>œ œ m œ m œ œ m œ
q = 66
mœ œ mœ mœ œ œ nœ mœ œ œ bœ
mœ mœ mœ bœ nœ R ‰.
108 6

Picc. & Œ ‰ Ó ∑
f 2ƒ f ƒ
5
4 4
3 7

œ œ œ. ˚˙ m w˚ œ ˙ w
mœ œ œ œ nœ mœ mœ mœ œ >œ
œ œ
œ mœ mœ Âœ µœ µœ œ
⋲ m œ Bœ œ m œ
108
m m
Ob. &4 7 4 4
Ï̇ Ï ƒ
6 5

˚ m w˚
6

ƒ bœ œ œ . Bb œ ˙ w
5

œœ m œ œ mœ >œ

⋲ m œ Bœ œ µ œ n œ œ œ m œ m œ œ µ œ mœ œ Bœ œ
108 5

Cl. & œ 7

ƒ 6 6
Ï Ï
5
ƒ
˚˙
œ œœB m˚w >œ œ œ ˙ w
? œ mœ œ œ mœ œ mœ
108

Ó ⋲
5

Bsn.
m œ
œ fl fl
ƒ fl fl Ï ƒ F
6
5

108
> ˚ ˙. w
& Ó ‰ œ œ ˙ bw Œ
J
Hn.

4 Í 2ƒ 4Ï p ƒ
108 >œ œ ˙ Bw˚
&4 Ó ‰ J Œ
4 4 b˙ . w
Tpt.

Í ƒ̇ Ï p
µœ œ ƒ
mœ œ œ
? ⋲ r µœ œ µœ
108 gliss. 3

Ó ∑ Ó
œ œ
Tbn.

f ƒ b˙ w
6

VII I p Ï
>˙ œ
œ #œ
6 brass

œ #œ #œ
5
mallets

œ #œ #œ
108

‰ œ ‰ Œ Ó ∑ ∑
3

& œ #œ œ nœ #œ J œ œ
Perc

f ƒ
° * °
œ
5

œ 2
7

œ Bœ œ œ œ Bœ œ  œ m œ
108
4 4
Ó
15

&
/ 46 / 48 / 50 + / 52 / 54 / 56

Bœ œ  œ œ µ Âœœ
6

&4 Ó ‰ œ œ œ mœ b œ œ œ B œ m œ m µ œœ Œ B n n œœœ
Synth 4 4
? b µœœ
5

Ó Œ
mœ œ mœ œ œ œ ˙ bw œ
108

Comp. & ∑ ∑ ∑ ∑ ∑
“bells”

œ
7

œ œœ œ œ œ œ œ mœ
5

& Ó ∑ ∑ Ó ∑
15

œ œ œ
bœ œ œ œ mœ mœ
6

& Ó ‰ œ œ œ mœ ∑ ∑ Ó ∑
f
Kbd.
5

? ‰ j Ó Ó ∑ Ó. ∑
mœ mœ
œ

œ ˙ mw œ Âœ µœ µœ œ œ ˙
q = 66
w
⋲ œ µ œ m œ Âœ œ œÂ œ
œœ µœ mœ œ æ ˙ w
sul E

œ
æ
108

Vn. &
2Ï 4 fÍ ƒ F
6 5
4
6

mœ œ mœ —
œ B Bœ m œ œ B˙ mœ œ mœ mœ œ ˙ w
æ µw
sul C

Vla. &4 ⋲ µ œ œæ œæ 4 ˙ 4 w & œ


Ï fÍ F
ƒ
5
5

—
æ
sul G

? œ œ Bœ œ œ Âœ . œ µœ ˙ w
5

& ⋲ œ Bœ m œ œ ∑ ∑ & œ œ bœ œ
œ
æ æ
Vc.

ƒ F

5

? Œ mœ œ
sul G
æ æ Ó
≤ ≥ sul pont.

Cb. mœ
æ æ æ b˙ ˙ mœ ˙ œ w
f ƒ p

3/16/08
A. Tear of the Clouds score 137

Tear of the Clouds 29

D bw w w
113

Picc. & ∑ ∑ ∑
Ï 3 2
113 œ
& Œ Ó ∑ ∑ ∑ ∑ ∑
Ob.
4 4
113 µœ
Cl. & Œ Ó ∑ ∑ ∑ ∑ ∑

B
113
œ
Bsn. Œ Ó ∑ ∑ ∑ ∑ ∑

œ
113

Hn. & Œ Ó ∑ ∑ ∑ ∑ ∑
3 2
113

& Œ Ó ∑ ∑ ∑ ∑ ∑
Tpt.
œ 4 4

?
113

Tbn. Œ Ó ∑ ∑ ∑ ∑ ∑
œ

b b œœ
crotales
113

Perc & Œ Ó ∑ ∑ ∑ ∑ ∑
ƒ
113
bµ b Bb œœœœœœ b˙ 3 2
& Œ Ó Ó
15

Synth
&
ø Ï
4 4
?

113

Comp. & ∑ ∑ ∑ ∑ ∑ ∑

H m œœœœœœ
& Œ Ó ∑ ∑ ∑ ∑ ∑
15

& ƒ ∑ ∑ ∑ ∑ ∑ ∑
Kbd.

? ∑ ∑ ∑ ∑ ∑ ∑

wO wO
D
113

& ∑ ∑ ∑ ∑
ø
Vn.

f 3 2
µ
µw O wO
& ∑ ∑ ∑ ∑
ø 4 4
Vla.

f
œ Œ Ó ∑ ∑ ∑ ∑ ∑
Vc. &

?
ord.

Cb.
œ Œ Ó ∑ ∑ ∑ ∑ ∑

3/16/08
A. Tear of the Clouds score 138

30 Tear of the Clouds

bœ . ˙ . ˙. œ . œŸ~~~~~~~~~~
( m œ)œ œ bœ
b>œ ˙Ÿ~~~~~~~~~~~~~
. (b œ ) œ̆ ⋲ œ ˙. >œ œ mœ mœ
‰. R mœ
119

Picc. & Œ œ˙
2 F 4 5
3 4 ƒ F 6
p
119

&4 ∑ ∑ ∑ ∑ ∑ ∑
Ob.
4 4 4
119

Cl. & ∑ ∑ ∑ ∑ ∑ ∑

B
119

Bsn. ∑ ∑ ∑ ∑ ∑ ∑

119

Hn. & ∑ ∑ ∑ ∑ ∑ ∑
2 4 3 4
119

&4 ∑ ∑ ∑ ∑ ∑ ∑
Tpt.
4 4 4

?
119

Tbn. ∑ ∑ ∑ ∑ ∑ ∑

b b œœ ..
soft vibraphone

b œœ
crotales mallets motor on (slow)
œ
Ó. ⋲ J Ó.
119

Perc & ∑ ∑ ∑ œ
#œ Œ ‰ Œ
p f œ 3

F
°
119
2 4 3 4
&
15

µœ
Bm œœœ BBb œœœœ µ m œœœœ
& 4 µ œœœ Œ
4 BBœœ Œ Ó
4 4 Bb œœ
Synth

p p
? m Bœœ Ó
œ
119

Comp. & ∑ ∑ ∑ ∑ ∑ ∑

∑ ∑ ∑ ∑ ∑
“bowed pluck”

&
15

œœœ b œœœ bœ
œœ œœ
& m œœœ Œ Œ Ó ∑ ∑ ∑ Œ Ó
œ
p p
Kbd.

? ∑ ∑ ∑ ∑ ∑ m œœ Ó
œ

Bœ ˙ . ˙.
arco

mœ µ œœ
pizz.

‰ J BBOœ
119 pizz. arco

& J ‰ Œ ∑ ∑ J ‰ Œ Œ
ø ø
Vn.

2 p 4 3π 4 P
j j
pizz. arco pizz. arco

& 4 Bœ ‰ Œ
4 ‰ µOœ Ȯ .. 4 Ȯ .. ∑ ∑ bœ ‰ Œ Œ m n Oœ
ø 4 J
ø
Vla.

p p P
Bœ µ µ Oœ
pizz.

‰ m Oj Ȯ . ? œ ‰ Œ
arco

& ∑ Ȯ .. ∑ ∑ J Œ
mœ .
ø ø
Vc.

p p
o
arco
pizz.

? Œ &œ
sul G

∑ ∑ ∑ ∑ ∑ Œ œ
ø
Cb.

3/16/08
A. Tear of the Clouds score 139

Tear of the Clouds 31

˙ œ.
m>œ m œ œ m œ bœ . ˙ œ >œ ˙Ÿ~~~~~~~~~~~~~~~~~
. (œ) œ ˙. œ ⋲ œ̆ J
œ œ bœ œ
œ mœ
‰.
125

bœ . bœ R
5

& œ. œ œœ œ
5:3

Picc.
œ R
fl
F f p 3
f p
7

125

& ∑ ∑ ∑ ∑ ∑
Ob.
4
125

Cl. & ∑ ∑ ∑ ∑ ∑

B
125

Bsn. ∑ ∑ ∑ ∑ ∑

125

Hn. & ∑ ∑ ∑ ∑ ∑
3
125

& ∑ ∑ ∑ ∑ ∑
Tpt.
4

?
125

Tbn. ∑ ∑ ∑ ∑ ∑

#œ . œ œ bœ
bœ œ œ œ b œ
œ. œ œ bœ œ
œ bœ
7
œ
5

⋲ #œ .
125

&Œ ‰ Œ bœ ‰ Œ
3 3

œ œ
Perc
J J œ #œ œ.
p
5
3

*° *° *°
j b µBœœœœ
125
3
& µµ µ œœœœœ Œ
15

µ m mµ œœœœœœ µ œœ b bœœœœœ
µ œœœ
&Œ ‰ b µ œœ
µ
œ Œ m µ œœ

œ π b œœ 4
p
Synth

Bœ p
?Ó ⋲ j Œ Œ Œ ‰ j Œ ⋲ Bœj . Ó
µœ . bœ
p
125

Comp. & ∑ ∑ ∑ ∑ ∑

&Œ ‰ j Ó ∑ ∑ m œœœœ Œ Ó ∑
15

m m m œœœœœ m m œœ b bœœœœœ mœ
&Œ ‰ bœ Ó œ Œ Ó ∑ Œ Ó Œ m œœ Ó
œ p
J F
Kbd.

?Ó œ bœ
⋲ j Œ Œ Œ ‰ j ∑ ∑ Œ œ. Ó
mœ . bœ

O mœ ‰ Œ µœ
125 pizz.

&w ∑ ∑ œ Ó Œ J ‰ Ó
J
Vn.

p p p 3
µœ ‰ Œ
pizz.

Vla. & wO ∑ ∑ J Ó ∑
4
p p
O œ
pizz.

Vc.
?w ∑ ∑ J ‰ Œ Ó ∑
p p
o
Cb. &w ∑ ∑ ∑ ∑
p

3/16/08
A. Tear of the Clouds score 140

32 Tear of the Clouds

>
Ÿ~~~~~~~~~~~~~~~~ œ œ œ m œ œ mœ œ . nœ bœ bœ . œ œ œ m>œ œ
œœ œ mœ œ œ nœ œ
˙ (mœ ) œ. œ bœ œ J bœ m œ œ m œ œ
130

Picc. & m˙ œ œ.. 5


3

3F
7

f 2 p F4 2
f
7 7
3

130

&4 ∑ ∑ ∑ ∑ ∑
Ob.
4 4 4
130

Cl. & ∑ ∑ ∑ ∑ ∑

B
130

Bsn. ∑ ∑ ∑ ∑ ∑

130

Hn. & ∑ ∑ ∑ ∑ ∑
3 2 4 2
130

&4 ∑ ∑ ∑ ∑ ∑
Tpt.
4 4 4

?
130

Tbn. ∑ ∑ ∑ ∑ ∑

bœ œœ ˙
#œ #œ œ #œ n˙ œ
130 7
œ œ bœ ‰ Œ j
3

& œ œ œ bœ œ œ œ
3

œ œ œ œ œ œ
Perc
œ
3
F p

3 bœ 2 4 r 2
Bµ œœœ
130 c11 / 60

& Ó
15

œ Bœ bBœœœ
œ Œ Bmm œœœœ µ µ b µ œœœœœ ‰ . µ BBœœœ
&4
p µœ 4 4 œ 4
p
Synth

? b B œœœœ π Bœ
œ Œ

µœ
130

Comp. & ∑ ∑ ∑ ∑ ∑

& ∑ Ó mœ ∑ ∑ ∑
15

œ bb b œœœœ
& ∑ œ Œ m œœœœ ∑ Œ m œœ Œ ∑
Kbd. œ

? b œœœ œ
∑ Œ Œ ∑ Œ Ó ∑

Â˙ Â˙ ˙
µœ
130

& ∑ Ó mœ ‰ ∑
J
Vn.

3 π 2 4 π p 2
µ˙ ˙ ˙
&4 ∑ j ‰ Œ Bœ ‰ ∑
J 4 4 4
Vla.

œ π π p
p
? µœ bb Ȯ
∑ œ ‰ Ó ∑ BBȮ Ȯ
J
Vc.

p π p

? j mœ
pizz.

Cb. & ∑ µœ ‰ Ó ∑ ∑ Œ
p P

3/16/08
A. Tear of the Clouds score 141

Tear of the Clouds 33

135
mœ œ œ œ
& œ m œ Bœ œ ˙. Œ ∑ ∑
ø
Picc.
w w
4 3
p
135

& ∑ ∑ ∑ ∑ ∑ ∑
Ob.
4 4
135

Cl. & ∑ ∑ ∑ ∑ ∑ ∑

B
135

Bsn. ∑ ∑ ∑ ∑ ∑ ∑

135

Hn. & ∑ ∑ ∑ ∑ ∑ ∑
4 3
135

& ∑ ∑ ∑ ∑ ∑ ∑
Tpt.
4 4

?
135

Tbn. ∑ ∑ ∑ ∑ ∑ ∑


œ
135

& ‰ œ bœ œ ˙ Œ œ Œ ‰ ‰ ∑ ∑
5
œ
bœ bœ ˙
Perc
œ 3
7

135 c12 / 66
4 c13 / 72 c15 / 84 c16 / 90
3
&
15

bµbœœœ .. Bœ µ œœœ
& ⋲ œ.
. B œœœ
bœ . 4Œ œ
Ó
4
J π
Synth

? Ó µœ

135
œ
Comp. & ∑ ∑ ∑ ∑ ∑ ∑

j
& ⋲ mœ . Œ ∑ ∑ ∑ ∑ ∑
15

b bœœ . œœ
. œœ
& ⋲ œ. Œ Œ Ó ∑ ∑ ∑ ∑
Kbd. J œ

? ∑ ∑ Ó mœ ∑ ∑ ∑
œ
µœ ˙.
sul A

œ j sul D

‰ m OO OO .. OO
135

& Œ ∑ ∑
ø ø ø
Vn.

4 p 3
µœ ˙.
& œ Œ Ó Œ µ µOœ Ȯ .. Oœ Ȯ .. Œ ∑
4
ø ø ø 4
Vla.

p
Ȯ Ȯ Ȯ Ȯ
? Oœ Oœ Ȯ .. Œ Ó Ó ∑
ø ø ø
Vc.

? ∑ ∑ Ó Œ ∑ ∑ ∑
œ
Cb.

3/16/08
A. Tear of the Clouds score 142

34 Tear of the Clouds

q = 90 q = 60
>œ ˙.
E
˙ m˙ ˙. µœ ˙. œ . bœ ˙ . œ. œ
mœ œ mœ
œ bœ œ mœ œ
141

Picc. & Ó
3 Ï
4f π
141

&4 ∑ ∑ ∑ ∑ ∑ ∑
Ob.
4
141

Cl. & ∑ ∑ ∑ ∑ ∑ ∑

B
141

Bsn. ∑ ∑ ∑ ∑ ∑ ∑

141

Hn. & ∑ ∑ ∑ ∑ ∑ ∑
3 4
141

&4 ∑ ∑ ∑ ∑ ∑ ∑
Tpt.
4

?
141

Tbn. ∑ ∑ ∑ ∑ ∑ ∑

hard mallets
œ >œ
œ# œ œ
141

Perc & Ó œ Œ Ó ∑ ∑ ∑ ∑
f

4˙ B˙ m˙
® m œ µœ µ œ µœ m œ µœ œ ˙
c1
141
3 µ˙
Ó œ bœ œ µœ bœ Ó
5

& œ µœ µœ
15
3

π mœ mœ µœ bœ
&4 Ó
3

4
f
Synth

141

Comp. & ∑ ∑ ∑ ∑ ∑ ∑

˙ b˙ ˙ ˙ m˙
"slow synth”

∑ œ bœ œ mœ bœ Ó ∑
5

& œ mœ mœ
15
3

π mœ mœ nœ bœ
& ∑ ∑ ∑ ∑ Ó ∑
3

Kbd.

? ∑ ∑ ∑ ∑ ∑ ∑
°
q = 90 E q = 60
Oœ Ȯ .. Ȯ .. Oœ m Oœ Ȯ .. Oœ µ B Oœ Oœ Ȯ Oœ m O O
mœœ
141

& ∑ ∑ ‰ J J
ø
Vn.
J
p
3
3 4
3

&4 ∑ ∑ ∑ ∑ ∑ ∑ B
Vla.
4

Vc.
? ∑ ∑ ∑ ∑ ∑ ∑

Cb.
? ∑ ∑ ∑ ∑ ∑ ∑

3/16/08
A. Tear of the Clouds score 143

Tear of the Clouds 35

˙. œ µœ . ˙ œ. œ œ œ œ bœ œ. œ œ ˙ µœ œ . . Âœ œ
J œ . nœ œ œ
147

Picc. & 5

p
5 3 3
5

œ
‰ J
147

Ob. & ∑ ∑ ∑ ∑ Ó Œ

147

Cl. & ∑ ∑ ∑ ∑ ∑

B
147

Bsn. ∑ ∑ ∑ ∑ ∑

147

Hn. & ∑ ∑ ∑ ∑ ∑

147

Tpt. & ∑ ∑ ∑ ∑ ∑

?
147

Tbn. ∑ ∑ ∑ ∑ ∑

crotale
147
œ
∑ ∑ ∑ ∑ Ó Œ ‰ J
take bow arco

Perc &
π
*
147
B˙ µ˙ ˙ µ˙ Bœ µœ œ
&Ó œ bœ œ µœ Œ
15

µœ œ
5

π Âœ nœ bœ mœ m œ µ œ bœ µ œ n œ
&
3

Synth
6

147

Comp. & ∑ ∑ ∑ ∑ ∑

b˙ bœ
“bowed synth pluck”

˙ ˙ m˙ œ œ
&Ó œ bœ œ mœ Œ
15

mœ œ
5

π œ œ bœ mœ m œ n œ
∑ ∑ ∑ ∑ bœ m œ nœ
&
3

Kbd. 6

? ∑ ∑ ∑ ∑ ∑

µœ œ
sul tasto.

Ȯ Oœ µ µOœ . Oœ Oœ r
b b Oœ .. Ȯ O O.
sul A
O O O ..
5

. Oœ  µ Oœ Ȯ
147 sul A

Vn. & J µœ µ BOœ Oœ


p P p
BO Ȯ Oœ Ȯ Oœ Ȯ .. Oœ ..
BŒ Œ µ œJ . µ µ Oœ Ȯ .. Oœ .. Oœ Ȯ Oœ .. .. b b Oœ Oœ
ø
Vla.
3 5
p 5

Vc.
? ∑ ∑ ∑ ∑ ∑

Cb.
? ∑ ∑ ∑ ∑ ∑

3/16/08
A. Tear of the Clouds score 144

36 Tear of the Clouds

w œ œ œ. ˙ œ . µœ ˙ œ œ.
œ
& J µœ œ Âœ œ nœ J
152

bœ œ µœ . mœ . mœ µœ bœ ‰ Œ
5 3

Picc.
J J
3
∏ π
˙. œ µœ . ˙ œ œ œ œ œ œ. œ œ . µœ œ
152
œ . . µœ œ œ µœ œ œ nœ . œ œ œ œ
Ob. & R J
p
5 5 3

152 Bbœ ˙ œ bœ
& ∑ ∑ ∑ ∑ ‰
ø
Cl.
3

B
152

Bsn. ∑ ∑ ∑ ∑ ∑

152

Hn. & ∑ ∑ ∑ ∑ ∑

152

Tpt. & ∑ ∑ ∑ ∑ ∑

?
152

Tbn. ∑ ∑ ∑ ∑ ∑

152
œ œ. œ
Perc & Œ Ó ∑ ∑ ∑ ⋲J Ó
π

˙
c2
˙ Ó ˙
& µ œ œ bœ µ˙
152 3

œ µœ µ˙ µ˙
5


15

œ µœ
6

( π) Âœ n œ bœ mœ mœ mœ Âœ œ
µ œ b œ µ œ n œ Bœ µ œ
& B œ µ œ Bœ œ Ó Bœ 5
Synth

µœ nœ œ
7
œ Bœ Bœ
? Ó Œ Ó b œ Bœ œ Bœ
µœ mœ
p œ
152
F
Comp. & ∑ ∑ ∑ ∑ ∑

π̇ π
˙ Ó ˙
& m œ œ bœ m˙ n˙
3

œ mœ m˙
5

œ
15

œ œ œ
6

( π) œ bœ mœ mœ œ bœ mœ nœ ( π) mœ œ mœ
& œ œ œ œ Ó ∑ œ mœ œ ∑ 5
Kbd.

œ œ œ bœ p
7

mœ nœ œ œ œ œ
? Ó mœ Œ Ó ∑ ∑
œ mœ
p F *°
œ bœ . œ Oœ Oœ Ȯ ..
m Oœ
ord.

µœ œ mœ . œ . µœ œ . . mœ œ r 5 j3
sul pont. sul tasto.

Âœ nœ
152 5

& œ œ œ . µœ œ œ Bœ j ‰ J
3

Bœ œ bœ .
ø
Vn.

P p
5


5

œ . Bœ œ j
Oœ & µ œ œ
sul tasto.

B Oœ .. œ . µœ œ œ µ œ œ . Bœ œ . . B œ œ . Bœ œ B œ œ bœ Bœ œ B œ
sul tasto.

Oœ b bOœ Ȯ
sul pont.
œ œ. œ œ.
5
Vla.
J
3 p P
3

Vc.
? ∑ ∑ ∑ ∑ ∑

Cb.
? ∑ ∑ ∑ ∑ ∑

3/16/08
A. Tear of the Clouds score 145

Tear of the Clouds 37

œ œ µœ œ œ mœ œ œ Âœ . œ ˙ ˙ œ µœ œ
µœ œ µœ œ mœ . œ J
5 5

œ . œ µœ . µœ .
157

R J œ bœ œ ‰ ‰
5

& J
3
Picc.

p ∏ ∏
3 5 3
3
3 7
Bb œ . œ œ œ bœ . œ œ. œ ˙
r 5 µœ œ . µœ œ œ nœ . œ .
& mœ œ nœ . . œ œ œ . µ œ œ Bœ œ œ œ ⋲ J œ
157 5
5

œ 4 8
Ob.

p π π P P
5

157 ˙ œ œ. œ œ . Bœ œ œ Bb œ œ œ œ. Bbœ
œ Bœ œ œ bœ œ. n œ œ œ . Bœ œ Bœ ‰.
5

& Bœ bœ œ œ r
3

J
ø
Cl.

p p œ
5
3


B
157

Bsn. ∑ ∑ ∑ ∑ ∑

157 con sord.

& ∑ ∑ Ó Œ
øœ ˙. 3 ˙.
Hn.

œ 7
157
p
& ∑ ∑ ∑ ∑ ∑
Tpt.
4 8

µœ ˙. œ ˙.
con sord.

?
157

∑ ∑ Ó Œ
ø
Tbn.

p
157
œ. œ
Perc & ∑ ∑ ∑ ‰ Ó ∑
p
5

˙
& ˙ œ
157

µ˙ 3 µ˙ 7
µ˙ Bœ Œ Œ µœ
15

µœ
3

mœ Âœ œ
Bœ œ œ œ œ Bœ œ œ œ œ µ œ Bœ
& µ œ Bœ œ Bœ Bœ œ Bœ Bœ
3

Synth bœ Bœ 6
4 8
5 7 5

? œ Bœ bœ œ œ
3

œ
mœ Œ Ó mœ ‰

157

Comp. & ∑ ∑ ∑ ∑ ∑
π̇
& ˙ m˙ n˙ œ ∑ Œ œ Œ m˙ nœ
15

œ
3

mœ œ œ
œ œ œ œ œ œ œ œ œ œ mœ nœ ( π) ∑
& mœ nœ œ œ œ œ œ
3

bœ œ œ
( π)
Kbd. 5 6

P
5 7

? œœ bœ œ œ
3

∑ ∑ œ
mœ Œ Ó ∑ mœ ‰
p

Ȯ .. Oœ O . O. Oœ Ȯ Oœ µ µ Oœ . Oœ Oœ Oœ Oœ Oœ Oœ µ µ Oœ j
. Ȯ Oœ m O O
157

& R mœœ
R
Vn.

3 P 7
5 5 3

µ µ Oœ Ȯ Oœ .. Oœ Ȯ
ord.

B ‰ J m m Oœ Oœ Oœ Oœ µ O . .
µ œ.. Oœ Oœ µ µ Oœ Oœ Oœ m m Oœ Oœ Oœ .. Oœ O Oœ ÂO Oœ
4 œ J œ
ø J
Vla.
5 8
p 3 5
P 3

.
B ‰ BœJ ˙ œ ˙.
sul tasto.

? ∑ ∑ ∑
ø
Vc.

π
5

? m˙ . ˙. œ ˙.
arco

∑ ∑ Œ
ø
Cb.

3/16/08
A. Tear of the Clouds score 146

38 Tear of the Clouds

œ œ mœ . œ œµ œ . œ. r
œ œ . m œ œ n œ . œ b œj ‰
take Flute

µœ œ
162

Bœ Œ ∑
5

µœ
5

&
6

Picc.

7 p 4 ∏
mœ œ. µœ œ œ nœ . œ . µœ œ. mœ œ œ nœ œ Bœ Bb œ Bœ
&8 œ mœ nœ J
162

œœ œ R
3

Ob.
RJ 4 œ  œ n œ œ œ ⋲ 3

p π P
5 5
5

œ œ bœ œ œ œ Bœ . œ œ Bœ œ œ
J RJ J mœ œ œ nœ . œ Bœ œ bœ œ. œ œ œ œ bœ .
162 3

Cl. & R J J J œ œ Bœ Bœ
P
3 5

P
3 5
5

B
162

Bsn. ∑ ∑ ∑ ∑

Œ. ? Ó
162

& Œ ∑ ∑ Œ œ
ø ø
Hn.

7 œ 4
162
Bb œ . ˙ œ mœ ˙ œ nœ . . œ
&8 ∑ ∑ ⋲ J J
Tpt.
4
π
3

? œ
Œ.
162

Œ ∑ ∑ Ó µ˙
ø ø
Tbn.

162
œ. œ
Perc & ∑ ∑ ⋲ J Ó ∑
p
c3
162
7 4 ˙ m˙
3

œ mœ
& ‰ Bœ µ˙ Âœ nœ Ó
15

µœ mœ Bœ mœ µœ µœ Âœ µœ
Bœ mœ nœ Bœ œ µœ Âœ
&8 œ bœ mœ
Synth 4 5
µœ œ 5

? ‰ µ œj Œ Ó
œ
162

Comp. & ∑ ∑ ∑ ∑
π π
‰ ˙ m˙ n˙ Ó œ mœ
3

& œ œ œ
15

œ mœ œ mœ nœ mœ nœ œ
œ mœ nœ œ mœ nœ
& ∑ œ œ bœ mœ œ (π)
œ 5

P
Kbd. 5 6

? ∑ ∑ ∑ ‰ bœj Œ Ó
œ

mœ œ
sul tasto.

œ œ
sul pont. sul tasto. ord.
162

Oœ Oœ µ O . j J Âœ œœ œ Âœ . œ nœ œ œ bœ œ œ Bœ µ œ œ
5

mO
Oœ ⋲ Bœ œ ⋲ ‰
5 sul A

& µ œ. Oœ µ µ Oœ Oœ R
ø
Vn.
3 J
p P ∏
5
7 4 3

r œ œ . m œ œJ œ nœ œ œ œ.
sul tasto.
œ œ œ
sul pont.

œ µ œ Âœ n œ Bœ O Oœ
6:5

Oœ ⋲ J œ mœ Âœ nœ ⋲ & œ B BOœ .. Oœ
ord.

B J
8 µBOœ J J
4 Bœ
ø
Vla.

p π
3 5
3

œ Bœ . . œ œ . Bœ œ
B œ œ œ ? BœJ œ œ œ. œ œ Bœ œ
sul pont. sul tasto.

bœ œ œ Bbœ
5

œ œ. œ mœ .
3
Vc.
J J J R µœ mœ nœ
P ∏
3 3 5 3

? ˙. ‰ ∑ Ó Œ ‰ œ w
J
ø ø
Cb.

3/16/08
A. Tear of the Clouds score 147

Tear of the Clouds 39

µœ œ œ. œ œ œ mœ œ œ µœ œ nœ œ µœ œ µœ
Flute œœ œ
J J J R R
166

Fl. & Ó Œ 3 5

π p
3 3 5 3

2 3
œ œ œ mœ µœ œ µœ œ Âœ . .
nœ bœ œ œ J mœ . . nœ œ . bœ œ œ . mœ bœ œ bœ
166

& J J bœ œ B œ œ bœ Œ œ
œ 4 4
Ob.

P p π P
3 3 3
5 6

œ œ Bbœ . œ bœ œ Bœ œ Bœ nœ œ œ.
J J J mœ . nœ œ bœ . œ œ ⋲ J
166

R b œ œ Bœ Bœ
5

& Bœ
œ œ ⋲
ø bœ Bœ œ œ bœ ø
Cl.

P
3 3 3 5


5


œ. œ œ bœ œ bœ œ œ Âœ œ œ . œ . Âœ
B B Œ ⋲ J
166


ø
Bsn.

p
5
5

? ˙
166

˙ ˙ Ó ∑ ∑
ø
Hn.

p 2 3
166
œ Bœ œ œ bœ œ œ œ. œ. œ œ œ œ

& œ œ. œ œ j 4 ‰ J
3

œ œ 4
Tpt.

p ∏ π
?
166 (remove mute)

˙ ˙ ˙. Œ ∑ ∑
ø
Tbn.

p
166

Perc & ∑ ∑ ∑ ∑

166
2 c4œ 3
& µœ bœ
5

Âœ nœ
15

mœ µœ µœ Âœ µœ
6

mœ n œ Bœ œ µœ Âœ mœ œ Bœ œ
& œ b œ Âœ Œ Ó œ bœ Âœ
Synth
œ œ œ œ 4 4
7

µœ µœ
6

? Œ ‰ Œ œ µœ mœ µœ
µœ œ œ
166

Comp. & ∑ ∑ ∑ ∑

& œ ∑ œ bœ
5

œ œ
15

mœ nœ mœ nœ œ
6

mœ nœ œ mœ nœ mœ œ œ π
& œ œ bœ œ Œ Ó œ œ bœ œ ∑
œ œ œ œ
P
Kbd.
7
6

? ∑ Œ mœ ‰ Œ ∑ mœ œ mœ mœ nœ
bœ œ œ
P
œ œ Bœ œ . Bb œ œ œ Bœ .
sul tasto.

mœ œ nœ œ. œ œ œ œ œ
& O OBBOœ .. Oœ Oœ .. µ µ Oœ Ȯ
166

œ. œ œ
J
Vn.

P p 2 3 3
O ® mœ . œ nœ .
sul tasto.

& œJ µ µ Oœ Oœ Oœ Oœ j œ
sul pont.
Oœ .. B Bb Oœ Oœ Oœ B BOœ .. j j œ œ bœ œ b œ
Oœ µ O bœ œ ⋲ B 4
3 5 3


3

R µœ µµ Oœ Oœ BBOœ Oœ J 4
Vla.

p P p π
5
3 5

œœ µœ œ . n œ œ Bœ œ ? mœ
sul tasto.

? ‰ m Oœ Oœ Oœ n b Oœ .. Oœ Oœ .. Oœ Oœ Oœ b bOœ .. Oœ .. B µœ µœ œ
ord.

R R
ø
Vc.
5

p P p P
3 5
5

? w w œ Ó. Œ œ
ø ø
Cb.

3/16/08
A. Tear of the Clouds score 148

40 Tear of the Clouds

bœ œ. Bœ œ œ µœ . . œ œ. µœ œ. µœ
170
œ. µœ œ Âœ . mœ J
& µœ bœ œ m œ n œ œ Bœ µ œ ‰