Vous êtes sur la page 1sur 4

Proceedings - 19th International Conference - IEEE/EMBS Oct. 30 - Nov. 2, 1997 Chicago, IL.

USA

Application of MIDI Technique for Medical Audio Signal Coding


Toshio Modegi
Communication & Information Operations, Dainippon Printing Co.,Ltd.
6-10, Ichigaya-Daimachi, Shinjuku-ku, Tokyo 162, Japan (Email: modegi@ethmw4.crl.go.jp)
Shun-ichi Iisaku
Communications Research Laboratory, Ministry of Posts & Telecommunications
4-2-1, Nukui-Kitamachi, Koganei-shi, Tokyo 184, Japan
Abstract: We are proposing applying MIDI technology to
coding of physiologic sounds such as -heart sounds or lung
sounds used in medical diagnosis for constructing medical
audio databases. We have implemented our proposed method
and checked up its processing speed, storage size of encoded
data and quality of decoded sound. Although our proposed
encoding method cannot reproduce the perfect original wave
pattern, we confirmed diagnosing points in source sounds will
not be degraded by our proposed method. In this paper, we are
going to describe our proposed encoding and decoding algorithms, and present several specific coding examples of both
heart and lung sounds on typical clinical cases.
Keywords: audio signal processing, audio coding, audio database, MIDI, music transcription, heart sound, lung sound
I. INTRODUCTION
For recording musical sounds, we are considering MIDI
(Musical Instruments Digital Interface) as an ideal coding
method because of its coding efficiency and high-quality sound
reproduction capability. If it applied to audio databases, we
can retrieve contents by keywords composed of MIDI strings
similarly as a text database.
Nowadays, this method is applied to on-line karaoke
(playing a song music without vocal for listeners singing), and
this application is utilizing both MIDI positive characteristics
of low bit rate and negative characteristics of difficulties for
handling vocal music. Constructing the karaoke audio database
servers is based on manual inputs of musical notes, and this
operation can be done with MIDI supported electronic instruments and is not so hard task for musicians. Moreover, automatic recognition technology of musical scores has been highly
advanced, and we can get MI data by scanning score images.
However, an automatic recognition and transcription technology from musical sounds has not been to a practical level although a lot of researches have been reported.[ l ]
We are interested in multimedia medical databases, especially audio databases for heart sounds and lung sounds. There
are several works for analyzing the signal characteristics of
these sounds from technological and micro- statistical point of
view[2]. Our purpose of research is analyzing and archiving
sounds from macro clinical point of view and we proposed a
MIDI encoding method especially for heart sounds and lung

sounds.[3]
Through our implementation of this proposed method, we
are also applying a MIDI encoding to other types of sound
materials such as animal voices, human speeches and songs.
But in this paper, we are focusing on heart sounds and lung
sounds, present several examples selected from our coding
experiments, and before that we are going to briefly describe
our proposed encoding and decoding algorithms.
11. BENEFITS OF MIDI FOR MEDICINE
For encoding a given source signal into MIDI, we must analyze the signal and extract its features, therefore we can use a
encoding process as a signal analyzing tool which is capable of
inspecting, diagnosing and classifying audio data. We can also
use a MIDI encoding as a signal visualization tool because it
can present a signal as a musical score. Moreover, We can use
a MIDI encoding as a database authoring tool, and we can
construct high-efficient audio databases and provide MIDI
keyword retrieval functions to it. The following sections are
listed up features of MIDI encoding and those merits especially
for medical applications.
A. Low Bit Rate
Suppose a l-second sound stream which is composed of 8
musical notes played by an instrument, its PCM coded size in
CD quality becomes about 80 kbytes, while its MIDI coded
size is only about 80 bytes, where the coded size of each note
is about 10 bytes. This feature especially gives benefits to
applications of tele-medicine and archiving of electronic medical records.

B. Score Presentation
Using a conventional 5-line musical score or a MIDI specialized editor, we can view content of signals and also modify it.
Moreover a MIDI encoding process includes feature extractions, and each note on a score may represent some diagnosis
point. This feature can be used as a new type of medical presenting medium for auscultators to explain to patients their
conditions, while the conventional presentation methods are
limited to a wave form PCG (phono-cardiogram) and a sound
spectrogram.
C.

1417
(0-7803-4262-3/97/$10.00 (C) 1997 IEEE)

Multi-Track Coding

Proceedings - 19th International Conference - IEEE/EMBS Oct. 30 - Nov. 2, 1997 Chicago, IL. USA

On the General MIDI standard specification, we can record


multiple sound streams up to 16 audio channels synchronously.
Like a standard ECG (Electro-cardiogram) which normally
records 12 channels, we can record multiple positions of heart
sounds, or record both heart sounds and lung sounds synchronously like shown in Fig.2 and search for the interaction between these two organs.

III. MIDI ENCODING METHOD


Our proposing algorithm, which is converting PCM sampled
audio data to standard MIDI format codes, is primarily based
on reference[4] and needs not complex calculations such as
FFT. Therefore, this algorithm implemented on a PC without a
DSP accelerator, makes possible a real time encoding. The
following sections are describing the details of this algorithm,
which consists of three steps: detection of peaks, detection of
signal sections, and note expression of sections, like illustrated
inFig. 1.
A

Detection

PCM Source Signal

Converted M D I data

Fig. 1. Overview of MIDI Encoding Algorithm

A. Detection of Peaks
This process is extracting local peaks in a PCM sampled signal
after removing its DC level, on condition that two values of
each pair of neighbor peaks become an opposite polar. M e r
that a basic frequency value of f(p) is given to each extracted
peak p by the following formula:
(1)
f(p)=Fs*n / { x(p+2n) - x(p) }
where x(p) is the sampled location of peak p, Fs is the sampling frequency and n is an appropriate natural number determined by an encoding operator.
B. Detection of Signal Sections
A signal section is at first extracted by an amplitude slicing as a
group of consecutive peaks whose absolute levels ]v(p)] are
more than the specified value S1. Next each section is subdivided by a basic frequency of each peak included in this section
in order that every peak of the section have a similar basic

frequency value. This subdivision operation is made while the


difference value among the note numbers of a section is larger
than the specified value Sn, where this note number N(p) at
peak p is calculated with its basic frequency f(p) by the following using a common logarithm function:
N(p) = 40* log{ f(p) / 440 } + 69 .
(2)
The note number 69 indicates a musical note name A3 and
value of frequency 440 [Hz]which is used for tuning musical
instruments. This formula indicates, if the value f(p) is increased to 2 times, the value of 12 which is an octave interval
will be added to the N(p).
After these processes, to each section s are given four parameters which are a starting sampled location of section Xs(s),
an ending sampled location of section Xe(s), an average basic
frequency Favg(s), and a maximum absolute level Vmax(s)
whose range is 0 to 1. In this step, a lot of short-length sections can be detected, and may make the coded size large.
Coping with this, we propose the following two kinds of restructuring operations for sections.
If the interval of consecutive sections Xs(s+l)-Xe(s) is less
than the specified value Lgap, and the difference between note
numbers calculated by Favg(s) and Favg(s+l) is less than the
specified value Sn , these sections will be integrated to one
section. Secondly, if the length of a remaining section Xe(s)Xs(s) is less than the specified value Lmin, this section swill be
removed.
C. Note Expression of Sections
Each extracted section can be coded as one musical note, and
four parameters are converted to two sets of MIDI command
strings based on the SMF ( Standard MIDI File) rule [4]. MIDI
codes are basically two types of command strings which are
Note-On and Note-Off, and a delta time value must be specified before each command string as follows:
Delta Time 1, Note-On, Note Number 1, Velocity 1,
Delta Time 2, Note-Off, Note Number 2, Velocity 2.
Suppose the ending location of previous note is Xprev, Delta
Time 1 is given by {Xs(s)-Xprev}*768/Fs and Delta Time 2 is
similarly {Xe(s)-Xs(s)}*768/Fs, where the value of 768 is the
maximum resolution of one second on the current MIDI standard. The codes of both Note-On and Note-Off is fixed
hexadecimal value 9X and 8X, where X is a channel number described in the next section. Note Number 1 and 2 is the
same value calculated by the formula (2), and Velocity 1 and 2
is also the same value given by Vmax(s)* 127.
D. Stereo Multi-track Encoding
We can encode multiple channels of MIDI data at the same
time. If source PCM signals are 2-channel stereo, the left signal
can be assigned to channel 0 and the right signal can be assigned to channel 1. The MIDI data of channel 0 and 1 are
discriminated by the lower digit code of Note-On and Note-Off.
Furthermore, using SMF format 1, data of each channel can be
recorded into a unique track.

1418

Proceedings - 19th International Conference - IEEE/EMBS Oct. 30 - Nov. 2, 1997 Chicago, IL. USA

IV. MIDI DECODING METHOD


If MIDI sound modules such as GM (General MIDI) , XG
(extended MIDI by YAMAHA), and GS (extended MIDI by
Roland) are connected to our PC, a MIDI decoding is very
simple. Specifjring a proper playback voice on the sound module, starting up a MIDI sequencer software provided by sound
module makers, and making it read and transfer our coded
SMF file described in the previous chapter to the MIDI sound
module with this tool.
However, the current available MIDI sound module is designed for a playback for a general music, and sometimes provokes troubles to our applications. Therefore, we are proposing the following improved decoding methods for playback of
special sounds such as used for medical diagnosis.

A. Hardware Structure of Experimental System


We are preparing two PCs operated by Microsoft Windows95
which are used for recording and playback workstation. To the
audio line-in of the recording workstation, two PIN-type microphones attached to Littmann-type stethoscopes, a compact
disc player, and the line-out of the playback workstation are
connected. To the playback workstation a YAMAHA MU-80
XG sound module and two active speakers are connected.
Using these two workstations at the same time, we can capture
and evaluate decoded signals.

A. Device Dependent Correction of MIDI Codes


Two parameters, which are specified length and number of
notes, must be sometimes modified while using general sound
modules. The first is the length of each note is limited to certain time which is dependent on both used voice and its note
number, while our specified length of note is unlimited. Therefore, sometimes we must subdivide a long-length note to multiple short-length notes. The second is the note number of
some SFX voice is not necessarily designed on the standard
MIDI rule, and sometimes we must transpose our coded notes.
For example, the XG sound module (YAMAHA MU-80)
provides a playback of a heart beat voice (Code: SFX No. 100) Fig.2. A Screen Image of MIDI Encoding Workstation
and we can decode high-quality both heart sounds and lung
sounds with this one voice. However, this voice can issue at B. Software Structure of Experimental System
most 5-cycle signals, therefore specifying high note number In the recording workstation are installed a sound recorder
shortens the running length of sound. Furthermore, we have software provided by the maker of the sound card and our
found the note number N must be given with modified as
developed MIDI encoding software which includes signal
N = N*2 - 22
(3)
evaluation functions such as displaying a spectrogram. In the
because this module defines the basic average frequency of playback workstation are installed a MIDI sequencer software
heart beats, which is around 110 [Hz] as A3 note (440 [Hz]). provided by the maker of the MIDI sound module, a score
editor SteinbergKUBASE and our developed MIDI decoding
B. Software Decoding Algorithm
software.
This method, which is a backward conversion of MIDI data to
The Figure 2 shows a multi-track encoding operation of
PCM format, is effective to handling a large amount of MIDI lung sounds (L-CH) and heart sounds (R-CH) captured syndata without a hardware module. Because the receiving buffer chronously on a healthy person, The left lists in the dialog-box
memory of most general sound module is limited to around 16 below shows generated basic data for MIDI sequences.
kbytes, sometimes we can not playback our coded MIDI data
with some hardware sound module.
C. Encoding Parameters
For decoding, we must prepare one or several cycles of In chapter 111, the following five user defined parameters are
source sampled PCM signals beforehand, then we can con- provided. In our coding experiments, we have determined
struct PCM signals repeating and modulating this sampled these parameters in order that the number of detected sections
signals. At first we can get four parameters of each section or coded notes becomes the smallest without degrading its
described in section III-B from MIDI data by a backward decoded quality.
conversion. Then the specified sections of composed PCM
n: Interval between peaks for calculation of a frequency.
signals, which are provided by repeating the prepared sampled
S1: Slice level of amplitude for extraction of sections.
signals, will be modulated by both AM and FM with Vmax(s)
Sn: Slice level of pitch interval for extraction of sections.
and Favg(s) parameters.
Lgap: Maximum time interval for integration of sections.
Lmin: Minimum valid length of time for sections.
V. METHODS OF EXPERIMENTS

1419

Proceedings - 19th International Conference - IEEE/EMBS Oct. 30 - Nov. 2, 1997 Chicago, IL. USA

D. Evaluation Methods
Our goal of encoding technique is reproducing not the same
signal but similar signal to the original. Using our developed
encoding software, we can compare decoded sounds to the
original by playback both sounds, showing two wave patterns
and two spectrograms.
VI. RESULTS OF EXPERIMENTS
In this paper we present 3 coded examples of both heart
sounds and lung sounds whose source materials are stored in
reference[5][6]. All of the source PCM data, whose length is
about 20 seconds, is sampled at frequency 44.1 [kHz] and each
sample is quantized to 16-bit resolution, therefore the source
bit rate is about 690 [kbps]. By trial and error, the encoding
parameters described in section V-C are determined as n=3,
S1=30[%], Sn=6, Lgap=lO and Lmin=20 for all of heart
sounds, whereas n=3, Sl=lOy/.], Sn=3, Lgap=3 and Lmin=6
for all of lung sounds. As a result, the converted bit rate becomes 206[bps] to 1.7 [kbps], and the compression ratio of all
cases is less than 1/400, and the ratio of the heart sounds is
better than the lung sounds.
In Fig.3, two drawings at each case shows two types of
musical scores, and we must explain the upper one designed by
us, which can present all of parameters in MIDI data. Each
rectangle indicates one note, its horizontal position is time and
vertical position is pitch, and its width is length and its height is
strength. The similar score is used for lung sounds in Fig.4.
In Fig.3, comparing three sets of scores, we can easily find
additional notes called third sound in the second DCM case,
additional harmony of notes called murmur in the third AS case,
and these two must be familiar to medical experts.
In Fig.4-(2), we can extract three significant notes whose
pitch are C3, which are called rhonchus known as a feature of
bronchial athma.

VII. DISCUSSION & CONCLUSION


For both heart and lung sounds, the compression ratio has
become what we expected. Especially for heart sounds, we
found sounds on abnormal cases needs a lot of notes, and these
additional notes are meaningful and can be interpreted in medical terms.
The decoded sound can be more look like the original sound
by software decoding method decribed in section IV-B. In our
algorithm low-level background noises will be removed because they seem to be unnecessary for diagnosis. But if you
want those noise, we could add them special effect hnctions
such as echoes in decoders.
In the near future, we want our method to be evaluated by
clinical medical experts, including whether a score presentation
can be clinical use or not. As additional functions, we are considering to develop such as an automatic rhythmic phrases
recognition. Furthermore, currently our encoding algorithm is
being applied to other types of sounds such as human speeches,
and might be utilized for diagnosis in otolaryngology.

2-Beat rhyibm
Bit-rate: 206 bps
I

. . . .
Summation
Gallop Rhythm
Bit-rate: 460 bps

(3) AS Case
(Aortic Slenosis)

Mid-systolic Murmur
Bit-rate 440 bps

Fig.3. MIDI Coded Examples of Heart Sounds


.

.......

......

.. . , ..

.....

......

....

....

(1) Bronchovesicular Sounds (Normal) Bit-rate: 1.7 kbps

c3

1......... i.......... :......... 1....................

c3

c3

i.......... ;.........i.......... j .........i....................

i.........

(2) Rhonchus (Bronchial Asthma) Bit-rate: 1.61 kbps


........
........

......

........

........
.... ........

......

Fig.4. MIDI Coded Examples of Lung Sounds

REFERENCES
[l] K. Kashino, K. Nakadai, T. Kinoshita, and H.
Tanaka, Note recognition mechanisms in the OPTIMA
processing architecture for music scene analysis, IEICE
Transactions, Vol.J79-D-II, No.11, pp.1751-1761, 1996.
[Z] H. Kanai, N. Chubachi, and Y. Koiwa, Acoustical diagnosis in medical engineering, Journal of IPU, Vo1.36,
No.3, 1995.
[3] T. Modegi and S. Iisaku, A MIDI encoding algorithm for
physiologic rhythmical sound, IEICE Proceedings of
General Conference, No.SD-5-1, 1997.
[4] J. Arai, D71MHandbooks - SMF Reference Book, Tokyo,
Japan,: Rittor Music, 1996.
[5] T. Sawayama, Cardiac Auscultations - Exercise with
Compact Disc, Tokyo, Japan,: Nankodo, 1994.
[6] T. Ishihara, Pulmonary Auscultation, Lung Sounu5 on
Compact Disc, Tokyo, Japan, :Nankodo, 1993,

1420

Vous aimerez peut-être aussi