Vous êtes sur la page 1sur 7

Forensic Cop Journal Volume 2(2), Dec 2009

http://forensiccop.blogspot.com

Standard Operating Procedure of Audio Forensic


by Muhammad Nuh Al-Azhar, MSc. (CHFI, CEI, MBCS)
Commissioner Police – Coordinator of Digital Forensic Analyst Team
Forensic Lab Centre of Indonesian National Police HQ

Introduction

There are many types of digital evidence which could be encountered by digital forensic
analyst in dealing with computer crime or computer-related crime. Not only files, videos,
digital images, encrypted items, unallocated clusters, slacks and so forth, but also digital
audio files might be analysed. In certain cases, the audio files become significant evidence to
show the involvement of the perpetrators in the criminal case. Usually it contains speech
records between two or more people talking about a plan to commit a crime; therefore the
analyst should be able to reveal this conversation to the criminal investigators. With this
evidence, the investigators have strong reason to prove that the perpetrators have planned
a crime.

To reveal the conversation contained in the audio files is not an easy job. The analyst should
follow strict guidelines of audio forensic so that the output of analysis could be accepted by
the court. Once the analyst does one step of analysis carelessly, the results of analysis might
be rejected by the court. To reach the results of audio forensic analysis in the best output,
this journal discusses Standard Operating Procedure (SOP) of Audio Forensic. With this SOP,
it is expected that the analyst could have a good guidelines in guiding them to perform
audio forensic analysis in step by step, so that the result of analysis is reliable.

The SOP of Audio Forensic comprises five steps, namely acquisition, authentication, audio
enhancement, decoding and voice recognition. Below is the explanation of each step.

Step 1: Acquisition

At this moment, conversation records which would like to be analysed are usually contained
in the digital recorder in the form of audio files. It means that it is digital evidence which
requires digital forensic procedure in order to handle it properly. Firstly, prepare the
analysis workstation to be ready to access the evidence of digital recorder. It can be reached
by applying write protect on the workstation. With this protect, it is expected that the
contents of digital recorder will not be changed during the process of acquisition. Write
protect in this case could be software or hardware, or even by legally tweaking a certain file
on Linux system. For this purpose (i.e. forensically sound write protect on Ubuntu system),
please access the forensic blog of http://forensiccop.blogspot.com on how to do it properly.

1
Forensic Cop Journal Volume 2(2), Dec 2009
http://forensiccop.blogspot.com

After ensuring the write protect runs well, attach the digital recorder into the workstation,
after that do forensic imaging. It means that the contents of the recorder are imaged
physically through bit stream copy method, so that it produces an identical forensic image
which is 100% the same as the source (i.e. the evidence of digital recorder). To do forensic
imaging, the digital forensic analyst could use some reliable forensic tools running under
Windows such as FTK Imager from Access Data or EnCase from Guidance Software,
meanwhile the analyst could also rely on dcfldd on Linux system to do the same thing. To
know how to perform forensic imaging properly by using dcfldd on Ubuntu machine and
how to check the identical image, please also access http://forensiccop.blogspot.som.
Briefly the analyst could do hashing to obtain md5 or sha1 value as the digital fingerprint on
either the source or the image. The values are then compared between the source and the
image. If they are match, it means that they are identical and the process of forensic
imaging is successful.

Based on this reason, it is expected that the analyst should also be able to understand Linux
system (i.e. in this case, it is Ubuntu) for digital forensic purposes, not only Windows. With
variety of forensic tools under Windows and Linux, the analyst will have many ways to do
digital forensic in order to obtain the best results.

Step 2: Authentication

On this step, the analyst extracts the audio file which would like to be analysed from the
image file; and then checks whether or not the file is original. If it is found that it has been
tampered causing the contents changed, so the further steps must be stopped. It is not true
to continue analysis when the audio file is not original. If the analyst still wants to do so, so
the results will be unreliable or even it will be rejected by the court. The effect of this, the
credibility of the analyst becomes sharply decreased.

To know the originality of the audio file, the analyst could apply two techniques, namely
metadata analysis and spectrum analysis. On metadata analysis, the analyst checks the MAC
(i.e. Modified, Access and Created date) time of the file. If the Modified and Created dates
show the same, so it can be considered the file is original. When a recording is started, the
recorder automatically makes a new audio file as the output of the recording after finishes.
This is the same as digital image produced by still image camera. The files produced have
the same Modified and Created dates. If the Created date is younger than the Modified
date, it means that the file is the result of copy-paste process. The file is originally from
other storage media; and then copied and pasted into the recorder. If this is found, so the
file stored in the recorder is not original.

On spectrum analysis, the analyst could use applications (i.e. software, hardware or both) to
analyse the spectrum of the file. One of them which is well-known in audio signal processing

2
Forensic Cop Journal Volume 2(2), Dec 2009
http://forensiccop.blogspot.com

is Cedar Cambridge. This instrument provides many modules for processing the audio. One
module which is useful to analyse the audio spectrum is Retouch. With this module, the
audio spectrum is visualised clearly, so that the analyst could find whether or not there is
editing or insertion in the contents of the audio. If such issues are not found, so the audio
can be considered original, otherwise it is not. To do spectrum analysis by Retouch on Cedar
instrument is not an easy work, the analyst must be carefully in analysing every area of the
audio. The combination of General and Zooming analysis is the best technique to see the
area of the audio clearly and precisely.

Step 3: Audio Enhancement

Many times, the evidence obtained is the audio with poor quality to listen. The poor audio
gives difficulty for the analyst to listen and to know what words pronounced by the
speakers. The conversation between two or more people is not clear, or even it is worst, the
speaker’s voice cannot be listened at all. The noise sound is much louder than the human
voice. To solve this problem, the analyst should be able to improve the quality over audio
enhancement by removing noises. The output of audio enhancement is that the audio
quality becomes clear to listen.

To do audio enhancement, the analyst could use several reliable applications, either
commercials or freeware. After dealing with these applications, it is found that the
commercials application such as Cedar Cambridge is better than freeware tools such as
Audacity on noise removal. However Audacity is still the first choice for audio signal
processing users as it can be run on either Windows or Linux. Besides flexibility, Audacity
provides user many useful modules to enhance the audio quality. They are noise removal,
high pass filter, low pass filter, amplify, click removal, normalize and so forth. However
these modules are not sufficient when dealing with high level of noise. For instance, the
conversation between two people is recorded with the background of jet engine sound as
the noise. The human voice cannot be listened at all. The only sound listened is the jet
engine sound. To remove such noises, the analyst could use Cedar Cambridge providing
many powerful modules for noise filtering. The modules offered by Cedar are Retouch, DNS
(Dialogue Noise Suppression), NR (Noise Reduction), Dehiss, Declick, Decrackle, Debuzz,
Spectrum Analysis and so forth. These modules are reliable to process any types of noise, so
that the analyst has a wide range of noise removal tools to apply.

Step 4: Decoding

On this step, the enhanced audio with good quality is decoded by the analysts. There is no
decoding for poor audio. The output of decoding process is transcript describing
conversation in the audio. The analysts have to transcript any words pronounced by the
speakers clearly. For the words pronounced unclearly, the analysts have to put “unclear
words” in the transcript.

3
Forensic Cop Journal Volume 2(2), Dec 2009
http://forensiccop.blogspot.com

The decoding process must be performed by at least two analysts in order to obtain high
level of accuracy of the transcript. If only one analyst do decoding, the transcript yielded is
less reliable. Many words in the transcript could be wrongly written. If the transcript has
many wrong words so the transcript could have wrong meaning. This could lead to the
wrong way of the investigation. It is a big problem. To solve this problem, decoding a clear
audio must be performed by at least two analysts. Firstly each analyst does decoding; and
then the transcripts they produce are compared while listening to the audio. The different
words in the transcript between them can be checked in order to get the correct words.

In the transcript, the analyst should put time when the speakers pronouncing the words or
full sentences. The aim of this technique is to give an ease for the analysts and readers when
analysing the conversation. Besides that, as the speakers are still unknown, so the analysts
have to put “speaker 1”, “speaker 2” and so on for each speaker in the transcript. This is
done to differentiate the speakers. The combination of time and speaker number gives
comprehensive transcript which is then useful for the next step.

Step 5: Speech Recognition

Known Samples
Speech recognition is performed to recognise the speakers in the audio file. To know
precisely the speakers speaking on the evidence of audio recordings, the analyst requires
known sample. There is no speech recognition when the analyst does not have known audio
samples. These known samples are essentials for speech recognition as the known samples
are compared to the unknown samples of the audio evidence.

Based on this description, speech recognition is a technique to compare known and


unknown samples on similarly pronounced words in order to recognise the speakers. As the
known samples play a key role for this recognition, the analyst must be careful when taking
the samples. Please ensure the identity of the people whose voice will be taken as the
known samples. Ask them to sign an official letter confirming their identity and willing to do
so. This letter is useful for the analyst to show the court that there is no forced action given
to them during the process of taking known samples of their voice.

Before the process of taking known samples, the analyst firstly makes a short script derived
from the transcript produced from the Decoding process previously. This script is then read
by the people on the process of taking their voice as the known samples. The recordings of
their voice can be done by using digital recorder or audio applications such as Cedar
Cambridge, Speech Analyzer and Audacity providing the feature of recordings. Before doing
recordings, please ensure that the recordings instrument runs properly.

4
Forensic Cop Journal Volume 2(2), Dec 2009
http://forensiccop.blogspot.com

Speaker Recognition by Spectrogram (SRS)


According to D. Meuwly from Institut de Police Scientifique et de Criminologie, Lausanne,
Switzerland, there are three types of speaker recognition, namely Speaker Recognition by
visual comparison of Spectrograms (SRS), Speaker Recognition by Listening (SRL) and
Automatic Speaker Recognition (ASR). This journal only discusses the SRS and SRL
techniques as it is widely used on speaker recognition.

The sound spectrograph is an instrument that shows the variation of the short-term
spectrum of the speech wave. In each spectrogram, the horizontal dimension is time, the
vertical dimension represents frequency, and the darkness represents intensity on a
compressed scale (Meuwly, 2000). The spectrogram of similarly pronounced words between
unknown and known samples is compared in order to check the special pattern of the
spectrogram. It is known that there is no identical spectrogram on the same pronounced
words even by the same person. It is the same as written signature. If the signatures
analysed are found identical, it must be fake. The same thing also occurs on spectrogram.
When the identical spectrogram on the same pronounced words is found, it means that it is
not original. It must be from copy-paste process.

On comparing the spectrogram on similarly pronounced words between unknown and


known samples, the analyst seek the special pattern which shows the same characteristic
between these two samples. When it is found, it can be concluded that the unknown is the
same as the known sample. It means that the unknown and known voices are from the
same source (i.e. the same person). Normally, 20 or more different words were needed for a
meaningful comparison. Less than 20 words usually resulted in a less conclusive opinion
such as possibly instead of probably (Koenig, 1986). If necessary, the spectrogram of
similarly pronounced words on the same sample is compared in order to obtain the
consistency.

Besides analysing spectrogram, the analyst should also make comparison on formant
frequency, pitch and so on between the two samples. The aim of this comparison is to
obtain a comprehensive comparison before deciding a conclusion of identification or
elimination.

Speaker Recognition by Listening (SRL)


SRL which is also known as Aural Recognition is a technique to support SRS. When the
results of SRS show the identification, SRL supports it. Although SRL is only supportive
element of SRS, it plays a significant role to recognise the speakers through their vocal
behaviours. It comprises vocal intensity and speech style. These two components are mainly
able to detect on someone when speaking; therefore it is used by the analysts to obtain the
characteristics of vocal behaviours. The vocal intensity shows intonation, speed and
frequency level of human voice, while the speech style dues to vowel and consonant
articulation, and dialect which is frequently found at the beginning or the end of
conversation.

5
Forensic Cop Journal Volume 2(2), Dec 2009
http://forensiccop.blogspot.com

To obtain comprehensive results of SRL, the analyst should make a table showing the value
of vocal intensity and speech style. The value scale proposed is the scale of five. Scale 5
shows the strongest mark, while the scale 1 shows the lowest mark. This table helps the
analyst to decide whether SRL identification or elimination. This conclusion should support
the results of SRS.

For familiar speaker, it is not easy for the analyst to perform SRL because the ears have been
familiar with the vocal behaviours of the speaker. The different thing occurs when listening
to unfamiliar speaker; the analyst encounters difficulty to analyse it over SRL. Making
assessment table is a good technique to obtain the values of vocal behaviours. The values
are then compared between unknown and known samples in order to conclude
identification or elimination.

Conclusion of Voice Recognition

There are three conclusions after finishing the steps above. They are
1. Identification. It means that the unknown and known samples are identical from the
same person.
2. Elimination. It means that the unknown and known samples are not identical. They
are from two different people.
3. Inconclusive. It means that it is not sufficient to state Identification and Elimination.
This happens because the numbers having the same special pattern of spectrogram
is less than 20 words. The other reason is very poor audio recordings including the
words pronounced.

Bibliography

Al-Azhar, M.N. (2009). Forensically Sound Write Protect. Forensic Cop Journal. 1 (3).
Available: http://forensiccop.blogspot.com. Last accessed 19 December 2009.
Al-Azhar, M.N. (2009). Similarities and Differences between Windows and Ubuntu on
Forensic Applications. Forensic Cop Journal. 1 (2). Available:
http://forensiccop.blogspot.com. Last accessed 19 December 2009.
Al-Azhar, M.N. (2009). Ubuntu Forensic. Forensic Cop Journal. 2 (1). Available:
http://forensiccop.blogspot.com. Last accessed 19 December 2009.
Broeders, A.P.A. (2004). Forensic Speech and Audio Analysis Forensic Linguistics. In: Daeid,
N.N. (ed.). 14th International Forensic Science Symposium, 19-22 October 2004. Lyon:
Interpol. P171.
Chou, W. and Juang, B.H. (2003). Pattern Recognition in Speech and Language Processing.
New York: CRC Press.

6
Forensic Cop Journal Volume 2(2), Dec 2009
http://forensiccop.blogspot.com

Hollien, H. (2000). Phonetics. In: Siegel, J., Knupfer, G. And Saukko, P. (eds.). Encyclopaedia of
Forensic Sciences, Three Volume Set, 1-3. Elsevier. p1243.
Koenig, B.E. (1986). Spectrographic Voice Identification: A Forensic Survey. Federal Bureau of
Investigation. Virginia. Available: http://tapeexpert.com/pdf/spectrographic
voiceid.pdf. Last accessed 19 December 2009.
Meuwly, D. (2000). Voice Analysis. In: Siegel, J., Knupfer, G. And Saukko, P. (eds.).
Encyclopaedia of Forensic Sciences, Three Volume Set, 1-3. Elsevier. p1413.
SWGDE. (2008). Best Practices for Forensic Audio. Available:
http://www.swgde.org/documents/swgde2008/SWGDEBestPracticesforForensicAudi
oV1.0.pdf. last accessed 19 December 2009.

Vous aimerez peut-être aussi