The Tritone Paradox: An Influence of Language on Music Perception

Author(s): Diana Deutsch

Source: Music Perception: An Interdisciplinary Journal, Vol. 8, No. 4 (Summer, 1991), pp. 335347
1991 by the regents of the

university of California

Music Perception
Summer 1991, Vol. 8, No. 4, 335-347

The Tritone Paradox:An Influenceof Language

on Music Perception
Universityof California,San Diego
The tritoneparadoxis producedwhen two tones that are relatedby a
half-octave(or tritone)are presentedin succession.Each tone is composed of a set of octave-relatedharmonics,whose amplitudesare determinedby a bell-shapedspectralenvelope;thus the tones are clearly
definedin termsof pitch class, but poorly definedin termsof height.
When listenersjudge whethersuch tone pairs form ascendingor descendingpatterns,their judgmentsgenerallyshow systematicrelationshipsto the positionsof the tones along the pitch-classcircle:Tones in
one regionof the circleare heardas higherand those in the opposite
regionare heardas lower. However,listenersdisagreesubstantiallyas
to whethera giventonepairformsan ascendingor a descendingpattern,
andthereforeas to whichtones areheardas higherandwhichas lower.
This paperdemonstratesthat the basis for the individualdifferences
in perceptionof this musicalpatternlies in the languagespokenby the
listener.Two groupsof subjectsmadejudgmentsof the tritoneparadox.
One grouphad grownup in California,and the othergrouphad grown
up in southernEngland.It was found that when the Californiangroup
tendedto hearthe patternas ascendingthe Englishgrouptendedto hear
it as descending,and when the Californiangroup tended to hear the
patternas descendingthe Englishgrouptendedto hearit as ascending.
Thisfinding,coupledwith the earlierresultsof Deutsch,North,and Ray
(1990) that showed a correlatebetweenperceptionof the tritoneparadox and the pitch rangeof the listener'sspontaneousspeakingvoice,
indicatesstronglythat the same, culturallyacquiredrepresentationof
pitch classesinfluencesboth speechproductionand perceptionof this

paperreportsthe firstdemonstration,to the author'sknowledge,

that perceptionof music can be influencedby the languagespoken
by the listener. It shows that two groups of listenerswho grew up in
differentlinguisticsubculturesperceive the identical musical pattern in
Requests for reprints may be sent to Diana Deutsch, Department of Psychology, University of California, San Diego, La Jolla, CA 92093-0109.

Diana Deutsch

The patternused to demonstratethis relationshipis known as the tritone paradox (Deutsch, 1986, 1987; Deutsch, Kuyper,& Fisher, 1987;
Deutsch,North, & Ray, 1990). It consists of two successivelypresented
tones that are relatedby a half-octave,or tritone. For example, C might
be presentedfollowed by Fjt,or D followed by G|t, and so on. Eachtone
is composedof a set of harmonicsthat standin octaverelation,and whose
amplitudesare scaledby a fixed, bell-shapedspectralenvelope(Figure1).
The tones are thereforewell-definedin termsof pitch class (C, C#, D, and
so on) but are poorly definedin termsof height.Whenlistenersdetermine
whether such tone pairs form ascending or descendingpatterns, their
judgmentsusuallydisplaysystematicrelationshipsto the positions of the
tones along the pitch-classcircle: Tones in one region of the circle are
heard as higher and tones in the opposite region are heard as lower.
However,thereis strikingdisagreementamong listenersas to which patterns are heard as ascendingand which as descending,and thereforeas
to which tones areheardas higherandwhich as lower. Forexample,some
listeners hear the pattern Ctt-G as ascendingand the pattern G-C)tas
descending,so that for these listeners,pitch class G is heardas higherand

Fig. 1. Spectralcompositionof a tone pairproducingthe tritoneparadox.Herethe spectral

envelopeis centeredat C5. The uppergraphrepresentsa tone of pitch class D, and the
lower graph representsa tone of pitch class Gjt.

The TritoneParadox


pitch class CJtas lower. However, other listenershear the pattern Ctt-G
as descendingandthe patternG-Cttas ascending,so that for theselisteners
the converseholds: pitch class Cjtis heard as higherand pitch class G as
The tritone paradox has been found to occur in the large majorityof
subjectsin a sizeable population, showing that the phenomenonis not
confinedto a few selectedindividuals(Deutschet al., 1987). Within this
population, no correlatewith musical training was obtained, either in
terms of the size of the effect, or its direction,or the probabilityof obtaining it. These findingsindicate strongly that the phenomenonis not
musicalin origin. A numberof studies have also ruled out explanations
in termsof low-level characteristicsof the hearingmechanism.For many
subjects,the profiles relatingpitch class to perceivedheight are largely
unalteredwhen the position of the spectral envelope is shifted over a
three-octaverange(Deutsch,1987). In addition,the profilesare unrelated
to patternsof relativeloudnessfor the harmoniccomponentsof the tones
when these are comparedindividually(Deutsch,in preparation).
A numberof informalobservationsled the authorto hypothesizethat
perceptionof the tritone paradox might be related to the processingof
speech sounds. Specifically,it was conjecturedthat the listenerdevelops
a long-termrepresentationof the pitch rangeof his or her speakingvoice,
andthatincludedin this representationis a delimitationof the octaveband
in which the largest proportion of pitch values occurs. It was further
conjecturedthat the pitch classes delimitingthis octave band for speech
are taken as definingthe highestposition along the pitch class circle,and
that this in turn determinesthe orientationof the pitch class circle with
respectto height.
A study was undertakento test this hypothesis(Deutschet al., 1990,
see also Deutsch, 1989). Subjectswere selected who showed clear relationshipsbetweenpitch class and perceivedheight in makingjudgments
of the tritone paradox. A 15-min recordingof spontaneousspeech was
taken from each subject,and from this recordingthe octave band containing the largest numberof pitch values was determined.Comparing
acrosssubjects,a significantcorrespondencewas indeedobtainedbetween
the pitch classesdelimitingthis octave band for speechand those defining
the highest position along the pitch-classcircle, as determinedby judgments of the tritone paradox.
The findingsfromthis experimentarein accordancewith the hypothesis
that perceptionof the tritoneparadoxis basedon a representationof the
pitch-classcircle by the listener,whose orientationis relatedto the pitch
rangeof his or her speakingvoice. Two versionsof this hypothesismay
then be advanced.The first,and more restricted,versiondoes not assume
that the listener'svocal range for speech is itself determinedby such an
acquiredtemplate. The second, and broader,version assumes that this

Diana Deutsch

template is acquireddevelopmentallythrough exposure to speech producedby others,and that it is used both to evaluateperceivedspeech,and
also to constrainthe listener'sown speech output. The characteristicsof
this learnedtemplatewould thereforebe expectedto varyacrosslinguistic
groups,in a fashion similarto other speechcharacteristicssuch as vowel
quality.On this line of reasoning,the orientationof the pitch-classcircle
with respectto height, as reflectedin judgmentsof the tritone paradox,
shouldbe similarfor individualswithin a linguisticgroup,but shouldvary
for individualsacross linguisticgroups.
Evidencefor the second hypothesiswas providedin the earlierstudy
of Deutschet al. (1987). An orderlydistributionof peak pitchclasses1was
found amonga sizeablegroupof subjectswho were undergraduates
at the
University California,
Figure 2, C#
occurredmost frequentlyas peakpitchclasses,the frequencyof occurrence
of the other pitch classes falling off on either side of these. Althoughno
informationwas obtainedconcerningthe linguisticbackgroundsof these
subjects,it can be assumedthat the majorityhad grown up in California
and were from the same linguisticsubculture.
The presentstudywas undertakenas a directtest of the hypothesisthat
listenersin a given linguisticsubcultureshould tend to agreein terms of
the orientationof the pitch-classcircle with respectto height, and that
listenersin differentlinguisticsubculturesshouldtendto disagree.The two

Fig. 2. Distributionof peak pitch classes within a subjectpopulationconsistingof undergraduatesat the Universityof California,San Diego. Redrawnfrom Deutsch et al.
1. The term "peakpitch classes"here refersto the two pitch classes that definethe
highestposition along the pitch-classcircle, as determinedby judgmentsof the tritone
paradox. See the Resultssection and Figures3 and 4 for details.

The Tritone Paradox


groups chosen to test this hypothesisconsisted of individualswho had

grownup in Californiaand those who had grownup in southernEngland.
(This choice was motivated by the author's informal observationthat
individualsfrom these two backgroundstended to hear the tritone paradox in opposite ways.) It was predictedthat the firstgroupwould show
a distributionof peak pitch classes similarto that obtained by Deutsch
et al. (1987) in the study of Californian undergraduates,but that the
second group would show a differentdistribution.

Two groups of subjects participated in the experiment and were paid for their services.
They were selected without regard for musical training, on the basis of obtaining no more
than six errors out of a possible 48 in a preliminary experiment in which they judged
whether sinusoidal tone pairs that were related by a half-octave formed ascending or
descending patterns. All subjects were free of clinical hearing deficits, as determined by
audiometry. The subjects in the first group (N = 24) had all grown up in California and
had all spent the previous year in California. The subjects in the second group (N= 12)
had all grown up in southern England, although most were now living in California. No
subject in the Californian group had a parent who had grown up in England, and no subject
in the English group had a parent who had grown up in California.

The tones all consisted of six sinusoids that stood in octave relation and whose amplitudes were determined by a fixed, bell-shaped spectral envelope (Figure 1). The general
form of the equation describing the envelope is as follows:

Aifl = 0.5-0.5cos [y logp(^-)] /min / < pymin

where A (f) is the relative amplitude of a given sinusoid at frequency f Hz, p is the
frequency ratio formed by adjacent sinusoids (thus for octave spacing, p = 2), y is the
number of p cycles spanned), and fminis the minimum frequency for which the amplitude
is nonzero. Thus the maximum frequency for which the amplitude is nonzero is y$ cycles
above fmin.Throughout, the values P = 2 and 7 = 6 were used, so that the spectral envelope
always spanned exactly six octaves, from fmin to 64fmin.
In order to control for possible effects of the relative amplitudes or loudnesses of the
sinusoidal components, tone pairs were created under envelopes that were placed at four
different positions along the spectrum. The envelopes were centered at 262 HZ (C4), 370
Hz (F)t4), 523 Hz (C5), and 740 Hz (F)t5), and so were spaced at half-octave intervals.
We can observe that the relative amplitudes of the sinusoidal components of tones at any
given pitch class when generated under the envelopes centered at C4 and C5 were identical
to those at the pitch class a half-octave removed when generated under the envelopes
centered at Fjt4and Fjt5.(As an example, the sinusoidal components of the tones comprising
the D-Gtt pattern, when generated under envelopes centered at C4 and C5, were identical
to those comprising the Gjl-D pattern, when generated under envelopes centered at F#4
and F|t5.) The averaging of results obtained by using these different spectral envelopes

Diana Deutsch

enabled the balancing out of possible effects of the relative amplitudes of the sinusoidal
components of the tones.
Twelve tone pairs were generated under each of the four spectral envelopes, corresponding to the pitch-class pairings C-F, Cjt-G, D-G#, Dtt-A; E-A#; F-B, Ftf-C, G-C)t,
G)t-D, A-Dtt, AK-E,and B-F. There were therefore 48 tone pairs altogether. All tones were
500 msec in duration, with no gaps between tones within a pair. The tones were all of
equal amplitude.
The tone pairs were presented in blocks of 12, each block consisting of tones generated
under one of the spectral envelopes and containing one example of each of the 12 pitchclass pairings. Within blocks, the 12 tone pairs were presented in any of four orderings.
The orderings were random, with the restriction that the same pitch class did not occur
in any two consecutive trials. In this way, 16 blocks were created altogether, with each
of the four orderings employed once for each of the four positions of the spectral envelope.

Subjects were tested in soundproof booths. On each trial, a tone pair was presented,
and the subject judged whether it formed an ascending or a descending pattern. Within
blocks, tone pairs were separated by 5-sec intertriai intervals, and there were 1-min pauses
between blocks. There was a 5-min break between the eighth and ninth blocks. Each
subject served in two sessions, and all 16 blocks were presented in each session. The data
from the two sessions were averaged. Several practice trials were administered at the
beginning of each session.

The tones were generated on a VAX 11/780 computer, interfaced with a DSC-200
Audio Data Conversion System. They were recorded and played back on a Sony PCM-F1
digital audio processor. The output was passed through a Crown amplifier and presented
to subjects binaurally through headphones (Grason-Stadler TDH-49) at a level of approximately 72 dB SPL.

The percentageof judgmentsthat a tone pair formed a descending
patternwas plotted as a functionof the pitch class of the firsttone of the
pair. The graphsin Figure3 displaythe data obtainedfrom six subjects,
in each case averagedover two sessions.Threeof the subjectswere from
England,and threewere from California.As exemplifiedin these graphs,
judgmentswere stronglyinfluencedby the positionsof the tones alongthe
pitch-classcircle.However, also as exemplifiedhere, the directionof this
influencevaried substantiallyacross subjects.
In orderto investigatethe form of relationshipbetweenpitch class and
perceivedheightwithin each subjectpopulation,the followingprocedure
was used. (Thiswas identicalto the procedureadoptedearlierby Deutsch
et al., 1987). For each subject,the pitch-classcirclewas bisectedso as to
maximize the difference between the averaged scores within the two

The TritoneParadox

C/5 0/3


a I




<U d



to o s

Diana Deutsch

halves. Next, the circle was oriented so that the line of bisection was
horizontal.The data were then retabulated,with the leftmostpitch class
of the upperhalf of the circletakingthe firstposition, its clockwiseneighbor taking the second position, and so on. In this way, the peak pitch
classesweredefinedas those that stood at the peakof the normalizedcircle
(i.e., at the third and fourth positions as here defined).So, for example,
from the graphsshown in Figure3, the peak pitch classesfor subjectAK
were C and C(t,and those for subjectCP were FHand G. Figure4 depicts
the two orientationsof the pitch-classcirclewith respectto heightderived
from the data of AK and CP shown in Figure3.
Next, the distributionsof peak pitch classes were determinedfor the
Englishand Californiangroupsseparately.As shown in Figure5, striking
differencesbetweenthe distributionsemerged.For the Englishgroup, F(t,
G, and GItoccurredmost frequentlyas peak pitch classes,whereasfor the
Californiangroup,B, C, C#, D, and D# occurredmost frequentlyinstead.
In orderto make a statisticalcomparisonbetweenthe two groups,the
hypothesiswas tested that the Californiangroup would show a form of
distributionsimilarto that obtainedearlierby Deutschet al. (1987) in the
study on Californianundergraduates,but that the Englishgroup would
show a differentform of distribution.To this end, comparisonwas made
betweenthe numberof subjectsin each groupfor whom the peakposition
lay in the half of the pitch-classcirclecontainingthe largernumberof peak
positions in the earlierstudy. Twenty-oneof the 24 Californiansubjects
fell into this category;however,only threeof the 12 Englishsubjectsdid
so. This difference between the two groups was highly significant
(p < .001 on a Fisherexact probabilitytest).

Fig. 4. Orientationsof the pitch-classcircle with respect to height, derivedfrom the

judgmentsof subjectAK (from California)and subjectCP (fromEngland),whose data
areshownin Figure3. ForsubjectAKthe peakpitchclasseswereC andC|t,andfor subject
CP the peak pitch classeswere F# and G. It can be seen that the two subjectsdisplayed
opposite orientationsof the pitch-classcircle with respectto height.

The TritoneParadox


Fig. 5. Distributionsof peak pitch classeswithin the Englishand the Californiansubject


In orderto examinewhetherthe phenomenonmight be relatedto musical training,the Californianand Englishgroupswere each dividedinto
those who had had morethan 2 yearsof training,and those who had not.
These subgroupswere then comparedby using the same criterion.No
significant difference emerged on this measure, among either group
(p > .05, on a Fisher exact probabilitytest, in both cases). This is in
accordancewith the previous results of Deutsch et al. (1987), which
showedno effectsof musicaltrainingon perceptionof the tritoneparadox.
In order to examine whether there might be an effect of age, the Cali-

Diana Deutsch

fornianand Englishgroupswere each dividedinto those who were over

22 yearsof age and those who were under22 years.Again,no significant
differenceemergedon this measure,among either group (p > .05, on a
Fisherexact probabilitytest, in both cases).Finally,comparisonwas made
between the male and female subjectsin both the Californian and the
Englishgroups, and again no significantdifferenceemerged(p > .05 on
a Fisherexact probabilitytest, in both cases).
The presentfindingsprovidestrongsupportfor the view that, through
a developmentallearningprocess,an individualacquiresa representation
of the pitch-classcircle that has a particularorientationwith respectto
height. The form of this orientationis derivedfrom exposureto speech
sounds producedby others and varies from one linguisticsubcultureto
another.From the presentexperimentwe can concludethat for Californians, the agreedupon orientationof the pitch-classcircleis such that the
highest position occurs around Cjt and D. However, for people from
southernEngland,the agreed upon orientationis such that the highest
position occurs around G instead. It is assumedthat such a templateis
employed both in the productionof speech and in the interpretationof
speechproducedby others.We can observethat a templatethat is based
on pitchclassratherthanpitchhas the usefulfeaturethat it can be invoked
by both maleand femalespeakers,eventhoughtheirvoices arein different
registers.Furtherevidencefor this hypothesiswas providedby the recent
findingsof Deutsch et al. (1990), describedearlier,which showed a significantcorrespondencebetweena listener'sorientationof the pitch-class
circle with respect to height and the pitch classes delimitinghis or her
octave band for speech.
We may brieflyspeculateconcerningthe evolutionaryvalue of such an
acquiredtemplate.As one possibility,it could be of considerableadvantage to determinethe emotional state of a speakerthroughthe pitch of
his or her voice. A templatesuch as this could serveto providea common
frameworkagainstwhich the pitch of a speaker'svoice may be evaluated,
so providingevidenceconcerninghis or her emotionalstate. Sucha template might also be involvedin the communicationof syntacticaspectsof
We now brieflydiscussthe implicationsof the presentresultsfor theories of pitch perception.It has been suggested by others that certain
characteristicsof pitch perceptionresult from a learningprocess derived
fromexposureto complexsoundsin the environment.Forexample,Whit-

The TritoneParadox


field (1967, 1970) suggestedthat patternsof neuralactivityresultingfrom

exposureto combinationsof harmonicsare learnedthroughcontinuous
exposureto such sounds. Thus when presentedwith a harmonicseries,
we attributethe fundamentalthat in our experiencehas most frequently
been associatedwith such a series.A more specificargumentalong these
lines was made by Terhardt(1974). He proposedthat, throughexposure
to speech sounds early in life, associativelinks are formed between the
harmoniccomponentsof thesesounds,so that ultimatelywhen a harmonic
series is presented,a fundamental(or "virtualpitch") is invoked by the
listener.He also proposedthat the samelearningprocessaccountsfor our
apprehensionof certainintervallicrelationships,such as the octave. Most
recently,Terhardt(1991) suggested, in agreementwith Deutsch et al.
(1990), thatthe tritoneparadoxcouldalso be the resultof a developmental
learningprocess derived from exposure to speech sounds. The present
findingsprovidestrongsupportfor such a perceptuallearninghypothesis
with respectto the tritoneparadox.They also lend indirectsupportto the
hypothesisthat certainothercharacteristicsof pitchperceptionmightalso
be based on perceptuallearning,althoughthese hypothesesawait experimentalverification.
Concerningthe musicalimplicationsof these findings,we can conclude
that undercertainconditionsat least, perceptionof musiccan be strongly
influencedby the languagespoken by the listener.The conditionsunder
which this influenceis manifestin naturalmusicalsituationsremainto be
determined.However,otherwork has shown that the tritoneparadoxcan
be producedby using a variety of tone complexes, provided that these
containsome ambiguityof height (Deutsch,in press).In addition,related
paradoxicaleffectshave been shown to occur in the perceptionof certain
two-partpatterns(Deutsch,1988; Deutschet al., 1984, 1986): Whensuch
patternsare transposedfrom one key to another,the relativeheights of
the differentpitch classes are preserved,so that there resultsa perceived
interchangeof voices. Further,when such patternsare presentedin any
one key, listenersdifferstrikinglyin termsof whichvoice is heardas higher
and which as lower, again reflectingdifferingorientationsof the pitchclass circlewith respectto height.It appearsreasonableto conjecturethat
differencesbetween listenersin perceptionof these patternswould also
depend on linguistic subculture,in the same way as do differencesin
perceptionof the tritone paradox.
Anotherconclusionfrom the presentfindings,togetherwith those obtained earlier on this class of paradoxes (Deutsch, 1986, 1987, 1988,
1989; Deutschet al., 1984, 1986, 1987, 1990), is that, althoughabsolute
pitch is generallyconsidereda rarefaculty,the largemajorityof us exhibit
a form of absolutepitch in makingjudgmentsof these patterns,in that
we hear notes as higheror as lower dependingessentiallyon their pitch

Diana Deutsch

classes. A related point has recentlybeen made by Terhardtand Ward

(1982) and Terhardtand Seewann(1983). These authorsfound that musicianswere able to determinewhetheror not well-knownpassageswere
playedin the correctkey, even thoughmost of theirsubjectsdid not have
absolute pitch as traditionallydefined.
In conclusion, the study reportedhere, coupled with the findingsof
Deutschet al. (1990), provides,to the author'sknowledge,the firstdemonstrationof an influenceof languageon musicperception.This influence
appearsto accountfor differencesbetweenlistenersin how certainaspects
of musicareperceived,and we may thereforeassumethat such differences
are culturalratherthan innate in origin. In contrast,the handednesscorrelatesthat have been obtainedwith perceptionof other musicalpatterns
[i.e., the octave and scale illusions(Deutsch,1974, 1975a, 1975b, 1983)]
indicatethat differencesin music perceptioncan also be based on innate
differencesat the neurologicallevel.
The findingthat two differentclassesof musicalpatternare associated
with clear perceptualdisagreementleads us to speculatethat other such
differencesmight also exist in music perceptionthat have not yet been
uncovered. Musical discourseis not preciseor accurateenough for such
perceptualdifferencesto become apparentthroughnormal communication, and it is only in the laboratorythat we can develop a clear idea of
what the listenerreallyperceives.The possibilityof basic disagreementat
the perceptuallevel thereforeshould be consideredin evaluatingthe issue
of communicationbetween composer,performer,and listener.2
