Vous êtes sur la page 1sur 8

Available online at www.sciencedirect.

com
Available online at www.sciencedirect.com
Available online at www.sciencedirect.com

ScienceDirect
Procedia Computer Science 00 (2017) 000–000
Procedia
Procedia Computer
Computer Science
Science 14200 (2017)
(2018) 000–000
166–173 www.elsevier.com/locate/procedia
www.elsevier.com/locate/procedia

The
The 4th
4th International
International Conference
Conference on
on Arabic
Arabic Computational
Computational Linguistics
Linguistics (ACLing
(ACLing 2018),
2018),
November 17-19 2018, Dubai, United Arab Emirates
November 17-19 2018, Dubai, United Arab Emirates
The
The Building
Building and
and Evaluation
Evaluation of
of aa Mobile
Mobile Parallel
Parallel Multi-Dialect
Multi-Dialect
Speech
Speech Corpus
Corpus for
for Arabic
Arabic
a
Khalid
Khalid Almeman
Almemana
a Department of Applied Natural Sciences, Community College of Unaizah, Qassim University, Qassim, Saudi Arabia
a Department of Applied Natural Sciences, Community College of Unaizah, Qassim University, Qassim, Saudi Arabia

Abstract
Abstract
This paper discusses the process of building and evaluation a mobile parallel multi-dialect speech corpus for Arabic. The method-
This paper discusses the process of building and evaluation a mobile parallel multi-dialect speech corpus for Arabic. The method-
ology for implementing the experiment is as follows: Two SIM cards were installed in two mobiles phones. One party is the sender
ology for implementing the experiment is as follows: Two SIM cards were installed in two mobiles phones. One party is the sender
and the other the receiver. Four different environments were chosen for the receiver, i.e. inside the home, in a moving car, in a
and the other the receiver. Four different environments were chosen for the receiver, i.e. inside the home, in a moving car, in a
public place and in a quiet place. By the end of the experiment, a new mobile parallel speech corpus for Arabic dialects was built.
public place and in a quiet place. By the end of the experiment, a new mobile parallel speech corpus for Arabic dialects was built.
The newly obtained corpus provides us with the benefits of a large, fully parallel and labelled speech corpus without the necessity
The newly obtained corpus provides us with the benefits of a large, fully parallel and labelled speech corpus without the necessity
of a big effort for collection and building. The resultant corpus will be made freely available to researchers. To evaluate the resul-
of a big effort for collection and building. The resultant corpus will be made freely available to researchers. To evaluate the resul-
tant corpus, the CMU Sphinx recogniser extracted the word error rates (WERs) 24.3, 17.9, 31.2, 18.7 and 32.0 for multi-dialect,
tant corpus, the CMU Sphinx recogniser extracted the word error rates (WERs) 24.3, 17.9, 31.2, 18.7 and 32.0 for multi-dialect,
Levantine, Gulf, MSA and Egyptian, respectively.
Levantine, Gulf, MSA and Egyptian, respectively.
©
c 2018 The
The Authors.
Authors. Published
Published by
by Elsevier
Elsevier B.V.
c 2018 The Authors. Published by Elsevier B.V.
This is an open
Peer-review access
under article under
responsibility thescientific
of the CC BY-NC-ND license
committee of the(http://creativecommons.org/licenses/by-nc-nd/3.0/)
4th International Conference on Arabic Computational Linguis-
Peer-review under
Peer-review under responsibility
responsibilityofofthe
thescientific
scientificcommittee
committeeofofthe
the4th
4thInternational
International Conference
Conference onon Arabic
Arabic Computational
Computational Linguis-
Linguistics.
tics.
tics.
Keywords: mobile corpus; cellular corpus; Arabic dialects corpus; Arabic parallel corpus; speech corpus for Arabic dialects
Keywords: mobile corpus; cellular corpus; Arabic dialects corpus; Arabic parallel corpus; speech corpus for Arabic dialects

1. Introduction
1. Introduction
The limited availability of resources providing data about the Arabic language affects the accuracy of diverse
The limited availability of resources providing data about the Arabic language affects the accuracy of diverse
natural language processing and speech recognition applications. The huge differences between Arabic dialects further
natural language processing and speech recognition applications. The huge differences between Arabic dialects further
heighten the need for additional resources for use in different areas.
heighten the need for additional resources for use in different areas.
In terms of speech recognition applications, using the same channel to train and test data is highly recommended to
In terms of speech recognition applications, using the same channel to train and test data is highly recommended to
guarantee high accuracy. For example, using data derived from a microphone source to identify mobile calls is likely
guarantee high accuracy. For example, using data derived from a microphone source to identify mobile calls is likely
to result in low accuracy.
to result in low accuracy.

∗ Corresponding author. Tel.: +966-16-3800050.


∗ Corresponding author. Tel.: +966-16-3800050.
E-mail address: kmeman@qu.edu.sa
E-mail address: kmeman@qu.edu.sa

1877-0509  c 2018 The Authors. Published by Elsevier B.V.


1877-0509  c 2018 The Authors. Published by Elsevier B.V.
1877-0509 ©under
Peer-review 2018responsibility
The Authors. of
Published by Elsevier
the scientific B.V.of the 4th International Conference on Arabic Computational Linguistics.
committee
Peer-review under responsibility of the scientific committee of the 4th International Conference on Arabic Computational Linguistics.
This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/3.0/)
Peer-review under responsibility of the scientific committee of the 4th International Conference on Arabic Computational Linguistics.
10.1016/j.procs.2018.10.472
Khalid Almeman / Procedia Computer Science 142 (2018) 166–173 167
2 Author name / Procedia Computer Science 00 (2017) 000–000

Table 1. The 13 combinations of short vowels for  /j/ ‘the Jeem letter’
FatHah Kasrah Dhammah Sukun
   

  
  
  
  
  

The purpose of this work is to build and evaluate a new version of multi-dialect Arabic speech parallel corpus,
which includes four Arabic dialects; Gulf, Levantine, Egypt as well as MSA. The Arabic multi-dialect speech corpus
[4] now has three different versions using different sources: a microphone [ibid.], a VOIP source [3] and this mobile
version, which will be explained in the experiment conducted for this research. All of these versions will be freely
available for researchers.
This paper is organised thus: Section 2 highlights the main features of Modern Standard Arabic (MSA) and its
dialects; Section 3 outlines the relevant work; Section 4 details the data used to produce the new corpus; Section 5
explains the methodology applied; Section 6 presents the resultant mobile corpus; Section 7 displays the tool used to
extract the WERs results and also introduces the WERs results for all experiments; Section 8 stresses the process of
the evaluation of the work; and Section 9 presents the conclusion and planned future work.

2. Modern Standard Arabic and Arabic Dialects

The Arabic language is one of the most widely spoken languages globally, ranked fourth in use after Chinese,
Spanish and English [9], with an estimated 422 million speakers [6]. Furthermore, Arabic is the official language in
24 countries [ibid.]. Each Arabic letter has up to four written forms1 based on its position in the word: initial, medial,
final or isolated.
In Arabic, diacritisation (Tashkeel) is used to denote short vowels. There are up to 13 different possible combination
forms for letters in Arabic [2]. Table 1 shows an example of the possible combinations for the letter  /j/ ‘Jeem letter’.
Overall, the different possible diacritised letters of the 28 Arabic letters reach in excess of 350 different forms [ibid.].

2.1. Modern Standard Arabic versus dialects in usage

MSA is the formal version of the language used for communication, and it is understood by the majority of the
Arabic population and employed both in the spheres of education and the media [10]. MSA enables individuals who
speak different dialects to communicate, although local dialects are used during the majority of telephone calls and in
routine conversations. Dialects have also more recently begun to appear in television programmes.

2.2. The multiplicity of Arabic dialects

The majority of contemporary Arabic dialects originated from a combination of diverse Arabic dialects and the
languages of neighbouring countries [15]; for example, the interaction between Arabic, Berber and French languages
led to the North African dialect [ibid.].

1
the Hamzah letter has five forms of writing.
2
In this paper, we represent Arabic words in some or all of the following variants: the word in Arabic letters /HSB transliteration scheme [16] /
(the dialect).
168 Khalid Almeman / Procedia Computer Science 142 (2018) 166–173
Author name / Procedia Computer Science 00 (2017) 000–000 3

In Arabic dialects, the majority of words originate from either MSA (for example,

 2
/salAsah/ (Egyptian),
      /tilifzywn/ ‘television’). In both cases
has the origin    /TalATah/ ‘three’ in MSA), or are loan-word (e.g. 
cited, there are significant differences between the original and current expressions.
Three different levels of changes are detectable in expressions known to originate from MSA: Firstly, there are
changes expressed by changing consonants or long vowels3 ; secondly, a number of changes arise from the use of
short vowels, i.e. diacritisation; and thirdly, changes take place when ignoring the Al Tajweed4 rules [2].
Two important linguistic aspects exist between MSA and dialects: (1) the large differences between the dialects [2]
and (2) MSA is that they are a second language, and therefore non-native to the Arabic speaker [15].
There are over 30 Arabic dialects [13]5 , with each country having its own specific main dialect, and some also
having a number of subdialects.

3. Related Work

There is a lack of available speech databases for Arabic dialects that can be used for speech recognition applications
and NLP tasks [11]. This lack affects the speech recognition tasks accuracy [2].
Few parallel speech databases have been collected previously [2], principally because compiling a speech corpus
is a time-consuming process [ibid.]. Anumanchipalli et al. [5] created an example of a parallel corpus, collecting ap-
proximately two speech hours of a parallel corpus for English and Portuguese. For the same work, they also collected
approximately 25 minutes of German and English [ibid.]. A further example detailing the production of a parallel
corpus is the work of Erjavec [12], who collected a parallel multilingual speech corpus for English, before translating
it into four languages [ibid.]. The final example of a collection of a parallel speech corpus for two different languages
is the work of Perez et al. [21], who collated a parallel speech and text corpus for the Basque and Spanish languages.
The TIMIT Acoustic-Phonetic Continuous Speech Corpus [14] is the most popular speech database. Many speech
corpus resources have been produced using TIMIT, such as CTIMIT (Cellular TIMIT) [7], NTIMIT (Network TIMIT)
[17], etc. In CTIMIT [7], a DAT player was placed in a van, with the output transmitted to a mobile phone by placing
a speaker close to it, which produced a new speech database. CTIMIT has the same content as TIMIT, but it has
different features.
As mentioned above, there is currently a lack of freely available speech databases for Arabic dialects for use with
speech-recognition applications and NLP tasks [11], which impacts on the accuracy of speech recognition tasks [2].
Contemporary Arabic speech corpora are derived from a number of different sources: (1) microphones for example,
the West Point corpus [18]; (2) receivers for example, the NEMLAR Broadcast News Speech Corpus [20]; (3)
telephone conversations for example, the CALL-HOME Egyptian Arabic Speech corpus [8] and Saudi accented
Arabic voice bank [1]. However, aside from the Arabic multi-dialect parallel speech corpus [4], which is a microphone-
sourced corpus, there is currently no available parallel speech corpus for alternative sources such as cellular networks.

4. Data

In this research, the Arabic dialects speech corpus [4] was used to conduct the experiments. In Almeman et al. [4],
the researchers collected more than 67,000 PCM audio files. This collection contains three Arabic dialects; Egyptian,
Gulf, Levantine and MSA. The resultant corpus was produced by Almeman et al. [4] has a 16-bit at 48,000 hertz, and
it was recorded using mono, i.e. one channel. The main subject domain of the Almeman et al. [4] corpus is travel and
tourism, and they also include a corpus of MSA numbers. Almeman et al. [4] used a microphone source to build the
entire Arabic multi-dialect speech corpus.

3 There are three long vowels in Arabic  /A/,  /W/ and  /Y/.
4 Means ‘to recite Quran in a correct way’.
5 A web-based publication contains statistics for more than seven thousands languages.
Khalid Almeman / Procedia Computer Science 142 (2018) 166–173 169
4 Author name / Procedia Computer Science 00 (2017) 000–000

5. Methodology

The methodology of this research was formulated after first determining the recording environments. Four envi-
ronments were chosen: inside the home, in a moving car, in a public place, and in a quiet place. Table 2 shows the
distribution of the speakers within these four environments, showing that they are equally divided.
There were two parties in this experiment: sender and receiver. A mobile network was used and the conversations
recorded on the SIM cards of the sender’s and receiver’s mobiles. The sender was in a fixed environment; i.e. inside
the home. However, the environment of the receiver varied as detailed above.
All the recordings made for the Arabic multi-dialect speech corpus were prepared manually. The city where the
data was collected is Unayzah, located about 300 km north of Riyadh (the capital of the Kingdom of Saudi Arabia).
It is a medium-sized city, so noise is expected in public areas, and road traffic is average. The chosen public areas and
the streets used in the experiment varied between high noise and medium noise.

Table 2. Recordings distributed between four chosen environments


Number of speaker
The environment Total
Egyptian Gulf Levantine MSA
Inside the home 5 3 2 3 13
In a moving car 5 3 2 3 13
In a public place 5 3 2 3 13
In a quiet place 5 3 2 3 13
Total 20 12 8 12 52

The final part of the methodology involves testing of the resultant corpora. One of the speech recognition engines
will be used to extract word error rate and evaluate the results which can be compared to the other versions.

6. Mobile Resultant Corpora

By the end of the experiment, we had obtained a new parallel mobile corpus, which included the same content as
the Arabic dialects speech corpus [4]. The resultant corpus includes four dialects Egyptian, Gulf, MSA, and Levantine.
It also contains more than 67,000 audio segmented files. The total number of participants in the resultant corpora is
52 speakers, with 12, 8, 12 and 20 speakers for MSA, Levantine, Gulf, and Egyptian dialects respectively. Table 3
presents the distribution of wave files for the chosen dialects. For additional details about the original speech corpus,
evaluation and overlap between the dialects see [4].

7. Recognition system and WER results

When extracting the results for the new corpus, we used CMU Sphinx [19]. To obtain the training results, we used
CMU Sphinxtrain v1.0.7 [23]. Meanwhile, to decode and extract the results, we used Sphinx v3-0.8 [22].
Tables 4, 5, 6, 7 and 8 present the word error rate results acquired with the CMU decoder. We set three different
values 4, 8 and 16for the Gaussian densities to try to obtain the best WER results. In addition, we set six different
values for tied states. The best WER results as Tables 4, 5, 6, 7 and 8 show were 24.3, 17.9, 31.2, 18.7 and 32.0 for
multi-dialect, Levantine, Gulf, MSA and Egyptian, respectively.

Table 3. The distribution of audio files


Corpus files total
MSA 15492 files
Gulf 15492 files
Egyptian 25820 files
Levantine 10328 files
Total for all 67132 files
170 Khalid Almeman / Procedia Computer Science 142 (2018) 166–173
Author name / Procedia Computer Science 00 (2017) 000–000 5

Table 4. Multi Dialect WERs results

Gaussian densities
4 8 16
1000 32.3 30.0 27.8
2000 29.5 26.3 25.4
3000 27.4 24.8 24.3
Tied States
4000 26.8 25.1 24.9
4500 26.5 25.0 25.0
5000 26.4 25.1 25.9

Table 5. Levantine WERs results

Gaussian densities
4 8 16
1000 23.4 19.4 17.9
2000 19.6 18.6 22.6
3000 19.6 23.2 32.5
Tied States
4000 21.0 29.7 47.0
4500 23.2 33.8 52.3
5000 25.2 38.0 56.8

Table 6. Gulf WERs results

Gaussian densities
4 8 16
1000 36.6 33.5 33.5
2000 32.6 31.2 34.6
3000 31.7 33.3 41.4
Tied States
4000 32.3 36.4 51.2
4500 32.9 39.9 54.8
5000 33.2 42.8 58.3

Table 7. MSA WERs results

Gaussian densities
4 8 16
1000 21.7 20.1 20.2
2000 19.4 19.0 19.8
3000 18.7 20.0 25.2
Tied States
4000 19.3 22.3 31.2
4500 20.3 24.8 35.3
5000 21.0 27.0 38.8

8. Evaluation

The main aim of this research was to re-record, train and evaluate one of the speech corpora using a mobile network.
The chosen speech corpus was Arabic parallel multi-dialect speech corpus [4]. The reason for choosing this resource
Khalid Almeman / Procedia Computer Science 142 (2018) 166–173 171
6 Author name / Procedia Computer Science 00 (2017) 000–000

Table 8. Egyptian WERs results

Gaussian densities
4 8 16
1000 38.9 34.5 32.5
2000 34.2 32.8 32.8
3000 33.1 32.7 35.8
Tied States
4000 32.0 33.4 39.5
4500 32.6 35.1 42.6
5000 33.5 35.9 45.5

Table 9. Accuracy when comparing the best results for microphone and mobile sources
Microphone source cellular source The difference
Multi dialect 13.7 24.3 10.6
MSA 8.2 18.7 10.5
Gulf 12.7 31.2 18.5
Levantine 8.8 17.9 9.1
Egyptian 11.2 32.0 20.8

is that it is uniquely (1) an Arabic parallel database, (2) has multiple dialects and (3) is freely available. By the end of
the experiment, we had developed a new version of the corpus, which is identical in content but arranged in versions
with their own specifications.
The methodology pointed out that four different environments had been chosen for the experiment: inside the
home, in a moving car, in a public place and in a quiet place. For those recordings held inside the home, the noise
level varied. Some rooms in the home are very quiet and some have noise in the background. For the recordings made
in a moving car, and those in public places, there was noise in the background. Of those recordings obtained in quiet
places, the noise was mostly at a low level compared to the previous environments, as expected.
Noise that occurs in the background can be divided into non-human noise and human noise. The examples of non-
human noise in the background are doors closing, cutlery sounds, car horns, road traffic, while examples of human
noise are crying, shouting, speaking, etc.
Mobile call quality can be affected by many additional factors, such as network signal quality, recording quality,
the distance between the mobile and the mouth, etc.
The comparison between the best results for the microphone source corpus and mobile call corpus is given in Table
9, which shows that the accuracy of recognition in the microphone source (the original) is higher than the mobile
source (the newly obtained corpus). The results vary between 9.1 and 20.8.
By checking the contrast between the speech (in the foreground) and the noise (in the background) we receive an
indication of the sound quality when evaluating the resultant corpus. WCAG 2.0[24]6 stated that background noise has
to be at least 20 root mean square (rms) dB lower than speech in the foreground. Thirty speech files were randomly
chosen, and the contrast was measured. The results are shown in Table 10. The differences between the foreground
and the background were over 20 rms dB for all selected files, except for one file which obtained 15.5 20 rms dB, i.e.
less than 20 rms dB. By checking this file, the problem appeared to be the weakness of the mobile network, which
affects the level of background noise. The average difference for all files was 44.74 rms dB, which carried out WCAG
2.0 suggested conditions. The results indicate that the noise level in the background was at an acceptable level.

6 Web Content Accessibility Guidelines (WCAG) 2.0 is a guideline for accessible audio files on the internet, which is recommended by W3C.
172 Khalid Almeman / Procedia Computer Science 142 (2018) 166–173
Author name / Procedia Computer Science 00 (2017) 000–000 7

Table 10. Speech contrast evaluation


Foreground Background Results -
Audio file #
Differ-
ence
Time Time Average Time Time Average Average
started ended (s) rms dB started ended (s) rms dB rms dB.
(sa ) (s)
MSA\01\A\01\53 0.76 1.60 -33.9 0.35 0.75 -75.7 41.9
MSA\02\E\01\111 0.01 1.41 -32.4 1.50 2.03 -76.4 44.0
MSA\04\H\02\103 0.19 0.86 -35.2 0.92 1.27 -81.3 46.1
MSA\05\A\04\27 0.43 1.37 -32.3 0.00 0.31 -80.8 48.5
MSA\06\C\01\44 0.35 0.98 -35.2 0.00 0.20 -76.5 41.3
MSA\11\G\01\127 0.24 1.11 -31.9 0.10 0.25 -60.0 28.2
MSA\12\D\02\04 0.44 1.84 -26.5 0.00 0.37 -78.5 52.0
MSA\07\A\01\36 0.26 2.57 -32.3 2.63 3.08 -79.7 47.4
GULF\01\A\01\02 1.01 3.39 -33.7 0.00 0.75 -77.5 43.8
GULF\09\D\01\42 0.34 1.06 -25.7 1.08 1.29 -70.4 44.7
GULF\03\H\02\25 0.43 1.35 -34.8 1.61 2.22 -75.9 41.1
GULF\03\A\01\38 0.73 2.21 -33.4 0.00 0.50 -76.8 43.5
GULF\03\G\01\84 0.72 1.59 -29.9 1.99 2.40 -64.0 34.1
GULF\10\H\02\61 0.32 1.27 -31.6 0.00 0.27 -78.0 46.4
GULF\03\B\01\24 0.44 2.16 -34.2 0.00 0.39 -77.1 43.0
GULF\06\C\02\02 0.27 1.56 -25.0 1.66 1.87 -75.9 50.9
LEV\01\A\01\10 0.44 2.48 -33.4 0.00 0.40 -48.9 15.5
LEV\01\H\02\39 0.20 0.93 -28.0 1.21 1.96 -79.9 51.9
LEV\02\D\02\49 0.54 1.58 -21.8 0.00 0.35 -63.3 41.5
LEV\06\D\01\25 0.51 1.39 -27.6 0.00 0.43 -78.6 51.0
LEV\05\A\02\46 0.62 1.70 -32.1 0.00 0.43 -77.6 45.5
LEV\06\D\01\23 0.52 1.55 -29.3 1.75 2.28 -78.1 48.8
LEV\06\C\01\19 0.52 1.91 -30.5 2.02 2.75 -76.5 46.0
EGY\01\A\01\03 0.66 1.31 -35.5 0.00 0.48 -77.3 41.8
EGY\03\E\01\105 0.64 1.67 -32.9 2.01 2.50 -81.7 48.8
EGY\07\F\01\29 0.58 0.99 -30.7 1.32 1.62 -81.4 50.6
EGY\15\A\04\21 1.10 1.97 -27.8 0.00 0.50 -77.7 49.9
EGY\16\E\01\143 0.76 1.56 -31.7 0.02 0.59 -77.8 46.1
EGY\20\H\02\216 0.56 1.50 -17.9 1.77 2.44 -67.7 49.8
EGY\14\E\01\204 0.76 1.36 -32.6 0.00 0.48 -80.3 47.7
Average — — -30.66 — — -75.04 44.74
a Seconds.

9. Conclusions

The result of this paper is a mobile multi-dialect Arabic speech corpus. The new source corpus will be freely
available to researchers. The Arabic multi-dialect speech corpus [4] now has three different versions using different
sources: a microphone, a VOIP source [3] and this mobile version, as noted in the experiment conducted for this
research.
The methodology employed four different environments for recording: inside the home, in a moving car, in a public
place and in a quiet place all equally divided between the four environments. There was also diversity in the noise
level, and the noise could be divided into non-human noise and human noise.
Khalid Almeman / Procedia Computer Science 142 (2018) 166–173 173
8 Author name / Procedia Computer Science 00 (2017) 000–000

The comparison between the best results for the microphone source corpus and the mobile calls corpus shows the
accuracy of recognition for the microphone source as higher than for the mobile source.
The average difference for all files tested was 44.74 rms dB, which satisfied the WCAG 2.0 suggested conditions.
In addition, this result indicates that the noise level in the background of the new resultant corpus is acceptable.
Various speech databases have been produced from the TIMIT speech database, e.g. FFMTIMIT (Free-Field Mi-
crophone TIMIT), NTIMIT (Network TIMIT), CTIMIT (Cellular TIMIT) and HTIMIT (Handset TIMIT), and STC-
TIMIT (the Single-channel Telephone Corpus TIMIT), so it is interesting for future work to check the ability to obtain
new speech corpora for new sources for parallel Arabic dialects.

References

[1] Alghamdi, M., Alhargan, F., Alkanhal, M., Alkhairy, A., Eldesouki, M., Alenazi, A., 2008. Saudi Accented Arabic Voice Bank. Journal of
King Saud University - Computer and Information Sciences 20, 43–58.
[2] Almeman, K., 2015. Reducing out-of-vocabulary in morphology to improve the accuracy in Arabic dialects speech recognition. Phd thesis.
[3] Almeman, K., 2017. Automatically building voip speech parallel corpora for arabic dialects 17, 4:1–4:12. URL: http://doi.acm.org/10.
1145/3132708, doi:10.1145/3132708.
[4] Almeman, K., Lee, M., Almiman, A.A., 2013. Multi Dialect Arabic Speech Parallel Corpora, in: Proceedings of the First International
Conference on Communications, Signal Processing, and their Applications (ICCSPA13), Sharjah, UAE. pp. 1–6.
[5] Anumanchipalli, G.K., Oliveira, L.C., Black, A.W., 2012. INTENT TRANSFER IN SPEECH-TO-SPEECH MACHINE TRANSLATION, in:
Proceedings of the Spoken Language Technology Workshop (SLT), IEEE. pp. 153–158.
[6] Bokova, I., 2012. World Arabic Language Day. http://www.unesco.org/new/en/unesco/events/prizes-and-celebrations/
celebrations/international-days/world-arabic-language-day/. [accessed 23 October 2017].
[7] Brown, K.L., George, E.B., 1995. CTIMIT: A speech corpus for the cellular environment with applications to automatic speech recognition,
in: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP 1995), IEEE. pp. 105–108.
[8] Canavan, A., Zipperlen, G., Graff, D., 1997. CALLHOME Egyptian Arabic Speech. Technical Report. Linguistic Data Consortium (LDC),
University of Pennsylvania. Philadelphia, USA. LDC Catalog No: LDC97S45, http://catalog.ldc.upenn.edu/LDC97S45 [accessed 23 October
2017].
[9] CIA, 2013. The World Factbook. https://www.cia.gov/library/publications/the-world-factbook/. [accessed 23 October
2017].
[10] Clive, H., 2004. Modern Arabic: Structures, Functions and Varieties. Georgetown Classics in Arabic Languages and Linguistics series. revised
ed., Georgetown University Press, Washington, DC, USA.
[11] Elmahdy, M., Gruhn, R., Minker, W., 2012. Novel techniques for dialectal arabic speech recognition. Springer, Heidelberger Platz 3, 14197 -
Berlin, GERMANY.
[12] Erjavec, T., 2004. MULTEXT-East Version 3: Multilingual Morphosyntactic Specifications, Lexicons and Corpora, in: Proceedings of the
LREC, pp. 2544–2547.
[13] Ethnologue, 17th ed., 2013. Arabic, Standard. http://www.ethnologue.com/language/arb. [accessed 23 October 2017].
[14] Garofolo, J.S., Lamel, L.F., Fisher, W.M., Fiscus, J.G., Pallett, D.S., Dahlgren, N.L., Zue, V., 1993. TIMIT acoustic-phonetic continuous
speech corpus. Technical Report 5. Linguistic Data Consortium (LDC), University of Pennsylvania. Philadelphia, PA, USA. LDC Catalog No:
LDC93S1, http://catalog.ldc.upenn.edu/LDC93S1 [accessed 23 October 2017].
[15] Habash, N., 2010. Introduction to Arabic natural language processing. Synthesis Lectures on Human Language Technologies, Morgan &
Claypool Publishers, 82 Winterspot Ln, Williston, VT 05495, USA. doi:10.2200/S00277ED1V01Y201008HLT010.
[16] Habash, N., Soudi, A., Buckwalter, T., 2007. On arabic transliteration. Springer, Heidelberger Platz 3, 14197 - Berlin, GERMANY. pp. 15–22.
[17] Jankowski, C., Kalyanswamy, A., Basson, S., Spitz, J., 1990. Ntimit: A phonetically balanced, continuous speech, telephone bandwidth speech
database, in: Acoustics, Speech, and Signal Processing, 1990. ICASSP-90., 1990 International Conference on, IEEE. pp. 109–112.
[18] LaRocca, C.S.A., Chouairi, R., 2002. West point Arabic speech corpus. Technical Report. Linguistic Data Consortium (LDC), University of
Pennsylvania. Philadelphia, USA. LDC Catalog No: LDC2002S02, http://catalog.ldc.upenn.edu/LDC2002S02 [accessed 23 October 2017].
[19] Lee, K.F., Hon, H.W., Reddy, R., 1990. An overview of the SPHINX speech recognition system. Acoustics, Speech and Signal Processing 38,
35–45.
[20] Maamouri, M., Graff, D., Cieri, C., 2006. Arabic Broadcast News Speech. Technical Report. Linguistic Data Consortium (LDC), University
of Pennsylvania. Philadelphia, USA. LDC Catalog No: LDC2006S46, http://catalog.ldc.upenn.edu/LDC2006S46 [accessed 23 October 2017].
[21] Pérez, A., Alcaide, J.M., Torres, M.I., 2012. EuskoParl: a speech and text Spanish-Basque parallel corpus, in: Proceedings of the 13th Interna-
tional Conference on Spoken Language Processing (INTERSPEECH 2012), Portland, Oregon, United States. pp. 2362–2365.
[22] Sphinx, 2009. Sphinx 3.0.8 [software]. http://sourceforge.net/projects/cmusphinx/files/sphinx3/0.8/. [accessed 23 October
2017].
[23] Sphinxtrain, 2011. Sphinxtrain 1.0.7 [software]. http://sourceforge.net/projects/cmusphinx/files/sphinxtrain/1.0.7/. [ac-
cessed 23 October 2017].
[24] WCAG2.0, 2008. Web content accessibility guidelines (wcag) 2.0. http://www.w3.org/TR/WCAG20/. URL: https://www.w3.org/TR/
WCAG20/. [accessed 23 October 2017].

Vous aimerez peut-être aussi