Académique Documents
Professionnel Documents
Culture Documents
tware, which can receive the user's mental analysis of notes and then store and
format those notes into standard music notation for personal printing or profess
ional publishing of sheet music. Some notation software can accept a Standard MI
DI File (SMF) or MIDI performance as input instead of manual note entry. These n
otation applications can export their scores in a variety of formats like EPS, P
NG, and SVG. Often the software contains a sound library which allows the user's
score to be played aloud by the application for verification.
Slow-down software
Prior to the invention of digital transcription aids, musicians would slow down
a record or a tape recording to be able to hear the melodic lines and chords at
a slower, more digestible pace. The problem with this approach was that it also
changed the pitches, so once a piece was transcribed, it would then have to be t
ransposed into the correct key. Software designed to slow down the tempo of musi
c without changing the pitch of the music can be very helpful for recognizing pi
tches, melodies, chords, rhythms and lyrics when transcribing music. However, un
like the slow-down effect of a record player, the pitch and original octave of t
he notes will stay the same, and not descend in pitch. This technology is simple
enough that it is available in many free software applications.
The software generally goes through a two-step process to accomplish this. First
, the audio file is played back at a lower sample rate than that of the original
file. This has the same effect as playing a tape or vinyl record at slower spee
d - the pitch is lowered meaning the music can sound like it is in a different k
ey. The second step is to use Digital Signal Processing (or DSP) to shift the pi
tch back up to the original pitch level or musical key.
Pitch tracking software
As mentioned in the Automatic music transcription section, some commercial softw
are can roughly track the pitch of dominant melodies in polyphonic musical recor
dings. The note scans are not exact, and often need to be manually edited by the
user before saving to file in either a proprietary file format or in Standard M
IDI File Format. Some pitch tracking software also allows the scanned note lists
to be animated during audio playback.
Automatic music transcription
The term "automatic music transcription" was first used by audio researchers Jam
es A. Moorer, Martin Piszczalski, and Bernard Galler in 1977. With their knowled
ge of digital audio engineering, these researchers believed that a computer coul
d be programmed to analyze a digital recording of music such that the pitches of
melody lines and chord patterns could be detected, along with the rhythmic acce
nts of percussion instruments. The task of automatic music transcription concern
s two separate activities: making an analysis of a musical piece, and printing o
ut a score from that analysis.[1]
This was not a simple goal, but one that would encourage academic research for a
t least another three decades. Because of the close scientific relationship of s
peech to music, much academic and commercial research that was directed toward t
he more financially resourced speech recognition technology would be recycled in
to research about music recognition technology. While many musicians and educato
rs insist that manually doing transcriptions is a valuable exercise for developi
ng musicians, the motivation for automatic music transcription remains the same
as the motivation for sheet music: musicians who do not have intuitive transcrip
tion skills will search for sheet music or a chord chart, so that they may quick
ly learn how to play a song. A collection of tools created by this ongoing resea
rch could be of great aid to musicians. Since much recorded music does not have
available sheet music, an automatic transcription device could also offer transc
riptions that are otherwise unavailable in sheet music. To date, no software app
lication can yet completely fulfill James Moorer s definition of automatic music t
ranscription. However, the pursuit of automatic music transcription has spawned
the creation of many software applications which can aid in manual transcription
. Some can slow down music while maintaining original pitch and octave, some can
track the pitch of melodies, some can track the chord changes, and others can t
rack the beat of music.
Automatic transcription most fundamentally involves identifying the pitch and du
ration of the performed notes. This entails tracking pitch and identifying note
onsets. After capturing those physical measurements, this information is mapped
into traditional music notation, i.e., the sheet music.
Digital Signal Processing is the branch of engineering that provides software en
gineers with the tools and algorithms needed to analyze a digital recording in t
erms of pitch (note detection of melodic instruments), and the energy content of
un-pitched sounds (detection of percussion instruments). Musical recordings are
sampled at a given recording rate and its frequency data is stored in any digit
al wave format in the computer. Such format represents sound by digital sampling
.
Pitch detection
Pitch detection is often the detection of individual notes that might make up a
melody in music, or the notes in a chord. When a single key is pressed upon a pi
ano, what we hear is not just one frequency of sound vibration, but a composite
of multiple sound vibrations occurring at different mathematically related frequ
encies. The elements of this composite of vibrations at differing frequencies ar
e referred to as harmonics or partials.
For instance, if we press the Middle C key on the piano, the individual frequenc
ies of the composite's harmonics will start at 261.6 Hz as the fundamental frequ
ency, 523 Hz would be the 2nd Harmonic, 785 Hz would be the 3rd Harmonic, 1046 H
z would be the 4th Harmonic, etc. The later harmonics are integer multiples of t
he fundamental frequency, 261.6 Hz ( ex: 2 x 261.6 = 523, 3 x 261.6 = 785, 4 x 2
61.6 = 1046 ). While only about eight harmonics are really needed to audibly rec
reate the note, the total number of harmonics in this mathematical series can be
large, although the higher the harmonic's numeral the weaker the magnitude and
contribution of that harmonic. Contrary to intuition, a musical recording at its
lowest physical level is not a collection of individual notes, but is really a
collection of individual harmonics. That is why very similar sounding recordings
can be created with differing collections of instruments and their assigned not
es. As long as the total harmonics of the recording are recreated to some degree
, it does not really matter which instruments or which notes were used.
A first step in the detection of notes is the transformation of the sound file's
digital data from the time domain into the frequency domain, which enables the
measurement of various frequencies over time. The graphic image of an audio reco
rding in the frequency domain is called a spectrogram or sonogram. A musical not
e, as a composite of various harmonics, appears in a spectrogram like a vertical
ly placed comb, with the individual teeth of the comb representing the various h
armonics and their differing frequency values. A Fourier Transform is the mathem
atical procedure that is used to create the spectrogram from the sound file s digi
tal data.
The task of many note detection algorithms is to search the spectrogram for the
occurrence of such comb patterns (a composite of harmonics) caused by individual
notes. Once the pattern of a note's particular comb shape of harmonics is detec
ted, the note's pitch can be measured by the vertical position of the comb patte
rn upon the spectrogram.
There are basically two different classes of digital music which create very dif
ferent demands for a pitch detection algorithm: monophonic music and polyphonic
music. Monophonic music is a passage with only one instrument playing one note a
t a time, while polyphonic music can have multiple instruments and vocals playin
g at once. Pitch detection upon a monophonic recording was a relatively simple t
ask, and its technology enabled the invention of guitar tuners in the 1970s. How
ever, pitch detection upon polyphonic music becomes a much more difficult task b
ecause the image of its spectrogram now appears as a vague cloud due to a multit
ude of overlapping comb patterns, caused by each note's multiple harmonics.
Another method of pitch detection was invented by Martin Piszczalski in conjunct
ion with Bernard Galler in the 1970s[2] and has since been widely followed.[3] I
t targets monophonic music. Central to this method is how pitch is determined by
the human ear.[4] The process attempts to roughly mimic the biology of the huma
n inner ear by finding only but a few of the loudest harmonics at a give instant
. That small set of found harmonics are in turn compared against all the possibl
e resultant pitches' harmonic-sets, to hypothesize what the most probable pitch
could be given that particular set of harmonics.
To date, the complete note detection of polyphonic recordings remains a mystery
to audio engineers, although they continue to make progress by inventing algorit
hms which can partially detect some of the notes of a polyphonic recording, such
as a melody or bass line.
Beat detection
Beat tracking is the determination of a repeating time interval between perceive
d pulses in music. Beat can also be described as 'foot tapping' or 'hand clappin
g' in time with the music. The beat is often a predictable basic unit in time fo
r the musical piece, and may only vary slightly during the performance. Songs ar
e frequently measured for their Beats Per Minute (BPM) in determining the tempo
of the music, whether it be fast or slow.
Since notes frequently begin on a beat, or a simple subdivision of the beat's ti
me interval, beat tracking software has the potential to better resolve note ons
ets that may have been detected in a crude fashion. Beat tracking is often the f
irst step in the detection of percussion instruments.
Despite the intuitive nature of 'foot tapping' of which most humans are capable,
developing an algorithm to detect those beats is difficult. Most of the current
software algorithms for beat detection use a group competing hypothesis for bea
ts-per-minute, as the algorithm progressively finds and resolves local peaks in
volume, roughly corresponding to the foot-taps of the music.
How automatic music transcription works
To transcribe music automatically, several problems must be solved:
1. Notes must be recognized - this is typically done by changing from the time d
omain into the frequency domain. This can be accomplished through the Fourier tr
ansform. Computer algorithms for doing this are common. The Fast Fourier transfo
rm algorithm computes the frequency content of a signal, and is useful in proces
sing musical excerpts.
2. A beat and tempo need to be detected (Beat detection)- this is a difficult, m
any-faceted problem.[5]
The method proposed in Costantini et al. 2009[6] focuses on note events and thei
r main characteristics: the attack instant, the pitch and the final instant. Ons
et detection exploits a binary time-frequency representation of the audio signal
. Note classification and offset detection are based on constant Q transform (CQ
T) and support vector machines (SVMs). An audio example can be found here
.
This in turn leads to a pitch contour namely a continuously time-varying line that
corresponds to what humans refer to as melody. The next step is to segment this
continuous melodic stream to identify the beginning and end of each note. After
that each note unit is expressed in physical terms (e.g., 442 Hz, .52 seconds). T
he final step is then to map this physical information into familiar music-notat
ion-like terms for each note (e.g., an A4, quarter note).
Detailed computer steps behind automatic music transcription
In terms of actual computer processing, the principal steps are to 1) digitize t
he performed, analog music, 2) do successive short-term, Fast Fourier Transform
(FFTs) to obtain the time-varying spectra, 3) identify the peaks in each spectru
m, 4) analyze the spectral peaks to get pitch candidates, 5) connect the stronge
st individual pitch candidates to get the most likely time-varying, pitch contou
r, 6) map this physical data into the closest music-notation terms.
The most controversial and difficult step in this process is detecting pitch .[7
] The most successful pitch methods operate in the frequency domain, not the tim
e domain. While time-domain methods have been proposed, they can break down for
real-world musical instruments played in typically reverberant rooms.
The pitch-detection method invented by Piszczalski[8] again mimics human hearing
. It follows how only certain sets of partials fuse together in human listening. T
hese are the sets that create the perception of a single pitch only. Fusion occu
rs only when two partials are within 1.5% of being a perfect, harmonic pair (i.e
., their frequencies approximate a low-integer pair set such as 1:2, 5:8, etc.)
This near harmonic match is required of all the partials in order for a human to
hear them as only a single pitch.
Example Software
ScoreCloud by Doremir Music Research AB. Both for audio and midi input. Glass, N
ick. "Is this Google Translate for Music"
. CNN International.