Computing and The Qur Ān

Computing and the Qurn
Some caveats
Thomas Milo
1. Introduction - Arabic Input and Output Assessed

In an essay entitled The Study of Tafsr in the 21st Century: E-Texts And
Their Scholarly Use, Andrew Rippin discusses the value of electronic editions of Qurn and Tafsr texts as published on the internet. i It contains
strong praise for computer technology:
I suspect that some of these texts have been transformed into their
electronic versions through Optical Character Recognition processes
(rather than being inputted through simple keying). This, of course,
speaks highly of the abilities of the technology and how much it has
progressed over the last 10 years - the very fact that this can be done
with Arabic strikes me as astounding.
Perhaps this view is a bit too optimistic: in reality Arabic OCR programs often have difficulty even reading perfectly clear, simple, unvowelled, horizontal computer script without kerning and without traditional ligaturesii. As a
tool for digitizing more fluid Arabic script, such OCR is unreliable, let alone
that it can be used for recognizing sophisticated, kerned typesetting or any
kind of handwritten Arabic text including key historic source material.
Similar uncritical faith in Digital Omnipotence is encountered in the article Computers and the Qurn, by Herbert Berg in the Encyclopaedia of the
Qurniii:
Producing electronic versions of the Qurn presents no more of a
technological difficulty than any other text, though the Arabic alphabet has several major encoding standards: ASMO449, ISO8859-6 and
Unicode.
This statement does not take into consideration the fact that not one of the
mentioned encoding standards handles the grapheme inventory of the con-
495
temporary Qurniv. This inventory is larger and typographically far more

complicated than the basic grapheme inventory of newspaper Arabic, the
only variety covered by the first two, now technologically obsolete, code
tables. Their successor, the Unicode standard, still misses some qurnic
graphemes while existing ones lack coherence and guidelines for use. But
while Unicode has at least the potential to be developed into a scholarly reliable encoding standard, that is only true as far as contemporary Qurn orthography is concerned. Early Arabic orthography is not at all covered, nor
will it ever be unless concerned scholars take the initiative. It is important to
realize that industry standards emerge from commercial and political concerns, not out of an awareness of the needs of the Qurn or, for that matter,
Classical Literature.
It is an enormous challenge to cover Qurnic Arabic graphemes unambiguously and exhaustively with Unicode as a non-linguistic compromise
out of a plethora of legacy encodings on the one hand and the anarchy of
substandard fonts to render it on the other. After all, the Unicode standard
deals with scripts and not with typography. It is thus essentially a business
initiative, and only active and concerted intervention on the part of scholars
can bring it up to academic standards. For instance, all y variants should
be covered by just y without dots (U+0649) and separate diacritical graphemes like two Arabic dots below (along with all other dots still missing
from Unicode and fiercely opposed by a few technocrats) and combining
hamz above (U+0654) and combining hamz below (U+0655). This last
character, along with both variants of combining hamz, superscript alif
and many others, is absent from the other two encoding agreements (or
rather disagreements, since their coexistence caused much confusion).
Arabic font behaviour is still inconsistent and unreliable as regards Unicode characters. Bad font compatibility disorients researchers when mapping characters what looks correct may actually be mis-encoded and vice
versa. For the user this means that what may appear on screen as identical
words are in fact digitally different words, with all that that entails for text
analysis, sorting, indexing, etc.v
Unicode uses a model resulting from earlier conferences about Arabic
computing. There was much in-fighting about the status of lm-alif,
whether to encode it as a single glyph (graphic encoding) or as two separate
letters (graphemic encoding). Eventually it was agreed that all contextual
shapes of one and the same letter should be covered by a single text code
this is the graphemic model. Therefore, in the encoded representation of
496
Thomas Milo
Arabic script there would be no ligature lm-alif, but separate codes lm

and alif for each constituting grapheme. The importance of this decision
cannot be underestimated: it was a choice for encoding content of script, not
form.
Early quranic orthography, with its unmarked letters and scarce, separately added disambiguation markers (little stripes for consonants and dots
for vowels) is fully archigraphemic and not structurally supported by Unicode graphemic model.vi By analogy with the archiphoneme in phonology,
in the Arabic writing system an archigrapheme is the common element between two or more graphemes minus their distinctive features. Therefore
archigraphemes represent another type of script content than graphemes,
though they relate to graphemes in a systematic way. Encoding Arabic exclusively as graphemes, without identifying the real basic elements is analogous to encoding lm-alif as a single code. This graphemic method places
Arabic in a synchronic world without a history or a sense of continuity. It
disconnects computerized Arabic from the diachronic aspects of writing,
but also from less-than-bureaucratic orthography and, last but not least,
from the culturally closely related non-Arabic use of the script Persian,
Urdu, Ottoman, etc.. This conceptual flaw is curiously matched by the absence of a scholarly critical text edition of the Qurn documenting the
transmission through the ages of this key historic text.
2. Type Design versus Script Analysis

In the preface to the 2007 re-edition of Arthur Jefferys The Foreign Vocabulary of the Qurn, Gerhard Bwering and Jane Dammen McAuliffe
quote Jeffery as saying:
The ideal would be to print on one page a bare consonantal text in
the Kufic script, based on the oldest MSS available to us, with a critically edited af text facing it on the opposite page, and with a complete collection of all known variant readings given at the foot of the
page.
Obviously a historical text can only be studied in the context of the writing
system in which it is attested. However in the case of the early Arabic script
in which the oldest and most relevant Qurn MSS are written, this is not
standard practice. And to introduce it, there are some real hurdles to negotiate. For starters, the art of writing and indeed the knowledge of authentic
497
kfi is extinct.vii Contemporary calligraphers can only guess how the ancient calligraphers constructed the letters.viii Moreover, on the scholarly side,
no complete description of the early Arabic writing system or systems exists.ix Not one publication mentions, let alone describes, the dissimilation
rules that are typical of all Arabic manuscript styles including kfi.
Dissimilation is a graphic technique that appears to have been designed
to improve the legibility of Arabic letter fusions. A fusion is the linking and
assimilating of a letter block into a single script unit. In a fusion, the abstract
graphemes that make up a letter block are visualised by allographs. In those
cases where assimilation leads to ambiguities, dissimilation is applied, which
implies the use of context-determined allographs. This pattern is just as
regular as the basic break-down in initial, middle, final and unconnected
forms and apparently without exception. Consequently it can be called the
dissimilation rule. x
Dissimilation as a distinctive feature is critical for disambiguating
groups of 4 or more B stubs and for identifying S archigraphemes:
The archigrapheme S is characterized by stub triplets, over
whose tops a virtual straight
line can be drawn (added ex- -SBplicitly by the author, here and
in the first example BSA below).
GSBH
distinctive feature:
raised middle form of B
In certain styles S takes the form of descending triplets:

The lowest stub in this example cannot be mistaken for B, since, like in
BSA
the example GSBH above, a B in that
position would be raised.
raised initial form of B

A comparison with the flat style
shows that regardless the angle of the
S triplets, the same rules apply.
BSA
BS-
raised initial form of B
BS-
Thomas Milo
498
Unlike in modern Arabic calligraphy and typography, in the pre-mansb

scripts letters can be classified in vertical, horizontal, round and cascading
categories. Each of these categories has a distinctly different fusion behaviour. For instance, a repetition of two identical vertical letters (archigraphemes) leads to a descending dissimilation sloping twins:
B is a vertical archigrapheme; two consecutive identical vertical letters, like B
in this example, dissimilate by applying
descending height.
BB
first element of twin vertical letters is

raised
Here is an example of the implications of the sloping twin pattern:

Single B connecting to Y, a horizontal
letter that only occurs in final position.
BY
By default, it connects with a curve.
The B sloping twins overrule final Y
causing it to lose the curve. This is the
regular assimilation of Y to a vertical,
non-initial letter. See also the letter
block SBBLY below.
BBY
BB-
B-
BB-
distinctive feature: sloping twins

Strings of uneven numbers of B archigraphemes are broken up in sloping
twins, starting with single, lower B allograph, as can be seen in this completely regular, yet revealing letter block:
la nubayyitannahu
LBBBBBH
normal initial form,

repeated sloping twins
-BBBBB-
499
It is the dissimilation rule that disambiguates S and BBB:

with three identical teeth.
SA
BBBA
opposition BBB:S by sloping twin

S-
BBB-
initial form
followed by sloping twins
It is the sloping twins that disambiguate BBS and SBB:
BBS
dissimilation from S
by sloping twin in initial position
BBS
LBBS
dissimilation from L and S

by sloping twin in middle position
-BBS
distinctive feature: sloping twins
SBBLY
Sloping twin,
in middle position.
Note that the Y has no bridge,
instead it assimilates vertically
to vertical letters like L.
GSBBM
Another example of dissimilation from S by sloping twin in -SBBmiddle position

SBB-
sloping twins
In this sloping twin dissimilation system, the letter block LLH receives completely regular treatment. The rules apply independently of the word which
is written with it:
Thomas Milo
500
BBH
LLH
LLD
LLH
The second generation mansb or proportionate scripts inherits the dissimilation system, but the rules are different. The method of sloping twin dissimilation is discontinued:
BBH
LLH
LLD
LLH
Only when the letter blocks LLH and FLLH are used to denote Allah, they
retain the first-generation feature of sloping twin dissimilation, along with
the more compact shape of lm typical of the early scripts. The resulting
specialized theograph adds to the second generation of scripts a novel
functional contrast to the writing system that can be classified as a kfism:
theograph
LLH
501
F + theograph
FLLH
fa li l-lhi
qallalahu
In early script grammar, of which this dissimilation system is an aspect, also

a variation factor can be isolated: maq or elongation. A variation factor is a
rule subsystem that governs additional shaping that is not obligatory and
that does not influence the semantics of the (archi-) graphemes. Vertical
letters (A, B, S, L, E, F, Q, N) and cascading letters (G) can stretch their base
line connectors, horizontal letters (D, C, K, T, Y) stretch their body. Round
letters or bumpers (R, W, M, H) remain passive: they accept connections, but
do not participate in any stretching themselves. For example, the archigrapheme D is a horizontal letter because the letter is horizontally elastic:
LLD
BBD
Therefore, in early script, the letter block LLH receives completely regular
treatment independent of the word which is written with it:xi
L_LH
LL_H
LLH
Thomas Milo
502
However, in the second generation scripts, L, including the lm of God, cannot generate an elongated connection, therefore shape variation by means of
elongation is ruled out:
theograph
3. Approaching Arabic Script with Linguistic Concepts

In his seminal Cours de Linguistique Gnrale, Ferdinand de Saussure
writes:
Si nous pouvions embrasser la somme des images verbales emmagasines chez tous les individus, nous toucherions le lien social qui constitue la langue. Cest un trsor dpos par la pratique de la parole
dans les sujets appartenant une mme communaut, un systme
grammatical existant virtuellemant dans chaque cerveau, ou plus exactement, dans les ceveaux dun ensemble des individues; car la
langue nest complte dans aucun, elle nexiste parfaitement que dans
la masse. En sparant la langue de la parole, on spare du mme
coup: 1o ce qui est social et ce qui est individuel; 2o ce qui est essentiel
et ce qui est accessoire et plus ou moins accidentel.xii
Scripts or writing systems are generally not perceived as part of a language,
let alone of grammar.xiii The grammar of Arabic, however, is incomplete
without covering its sophisticated writing systemxiv. By substituting script
images for images verbales, the distinction between competence and performance becomes applicable on the Arabic script systems as well: script and
writing. From manuscript evidence of writing in a perceived style a script
grammar (un systme grammatical) can be inferred. By analogy with linguistics, such grammar can serve both as a theoretical model and as a yardstick for understanding and systematically describing variations in performance. Seen in this light, Franois Droches classification using the metaphor
of circles is in fact a case of mapping variations in performance without the
explicit concept of shared competence underlying all instances of performance:
503
As a working hypothesis we have decided to consider each cluster of

scripts as a circle whose centre is occupied by the manuscripts showing the greatest care, the greatest skill and the greatest regularity. The
further one goes from the centre, the more examples ones finds in
which the scribe has only loosely reproduced the letter shapes that
distinguish the ideal form of the script.xv
With the ideal form of the script Droches may imply a concept of grammar, with the circle mapping various degrees of sophistication in its performance. However, from a Saussurian point of view, the centre of such a
circle has no more relevance than the other positions in the circle. To establish the competence behind a certain script, any performance can yield clues:
car la langue [substitute: lcriture] nest complte dans aucun, elle nexiste
parfaitement que dans la masse. For it is the method of fusing and dissimilating that evidences the grammar, whereas the scribes perfect execution of
letter block fusions or stylized swashing of final forms is more relevant from
an art historical perspective. Regularity, at least in terms of script grammar,
remains problematic as it can only be assessed after all the rules have been
identified which is presently not yet the case. In other words, the complete
contents of the circle must be used to reconstruct the script grammar.
Then, the ultimate test is to turn this grammar into a computer model. To
get esthetically pleasing results, the material near the centre of the circle
should best be used to model the computer glyphs.
The fact that it is not just theoretically but also practically possible to
make accurate computer models of Arabic script systems, creates a new
scholarly obligation. Turning an analysis into a computer model is a unforgiving method for exposing inconsistencies and shortcomings. xvi In terms of
structural linguistics, a computer model of a script must be based on competence rather than performance, and by nature requires nothing less than
complete, exhaustive analysis. The resulting computer synthesized script
images enable visual verification of the models accuracy: reversible analysis.
The present body of publications about early Arabic does not contain
sufficient information to make a computer model of a script in this sense.xvii
Apart from the issue of dissimilation mentioned above, the description of
allographic behavior is incomplete. For instance, no publication explicitly
specifies that, though dl and non-final kf appear to have the exact same
shape, this only happens in complementary distribution.xviii This shared
shape occurs as kf only in non-final position and as dl, of course, exclusively in final position):xix
Thomas Milo
504
BKSBW
BBD
distinctive feature: position

Only before a space, final kf (connected and unconnected) is disambiguated from dl by a vertical bar:
BK
BBD
distinctive feature: shape

This is just one example where early Arabic script differs in structure from
contemporary Arabic script. Computer fonts that are marketed with the
name or trade mark Kufic are not based on the characteristically different
script grammar of early Arabic script. Such fonts are artists impressions
rather than reliable computer models of the scripts that they are named after. Therefore to print bare consonantal text in the Kufic script as proposed by Jeffery is not as trivial as it seems, because:
1. the encoding standard that is now at the heart of all software,
lacks the concept of archigraphemes, the basic unit of early Arabic orthography;
2. for building a computer model of kfi script that is scientifically
correct, all knowledge of early Arabic has to be built from
scratch.xx For authentic script synthesis nothing less than an exhaustive, reversible analysis will work.xxi
505
4. Final Remarks
The EI2 article Computers and the Qurn continues:
The pages of the Qurn need only be scanned and preserved as images or, alternatively, scanned and then encoded according to one of
these standards using Optical Character Reader (OCR) software.
Given the incomplete code set coverage of the Qurn, this paragraph unintentionally proposes what amounts to a breach of scholarly integrity be it
an innocent one as long as there exists no OCR that can do the job anyway.
While straightforward image digitization scanning - is primitive, it is at
least not corrupt. However, given the fact that ASMO449 and ISO8859 (and
dozens of alternatives) are not designed to cover the Qurn and that its
coverage by Unicode is still incomplete and ambivalent, it is impossible to
encode the Qurn without tampering with the text. Doing so regardless
invalidates research before it is even started, as it will be based on unreliable
text.
Still quoting Computers and the Qurn:
Many such electronic versions of the Qurn already exist Nor
does digitizing the Qurn present any significant theological difficulty.
Apart from the theological implications, the situation sketched above should
alarm any academic researcher. And, in fact, it does. In the quoted text Andrew Rippin remarks about e-texts and their scholarly use:
The basic inaccuracy of the available texts is certainly problematic.
This manifests itself in a number of ways: simple textual errors, unexplained textual changes, and lack of clarification in text-comprehension matters and in text-critical matters.
But there is more: even if every single grapheme attested in a particular type
of Qurn recension, or any early manuscript for that matter, were covered
by the Unicode Arabic character set, serious problems remain. For instance,
the latest version of the Unicode standard (5.1, 2008) defines y without
dots (U+0649) as a continuous letter with four-fold assimilation. Yet some
fonts still program the alif maqr to disconnect in non-final position:
their designers are unaware the qurnic occurrence of y without dots in
initial or middle position (OCR programs work on the same assumption and
will therefore fail). Others provide four-fold assimilation, but with erroneous dots in the non-final position. And, of course, a few fonts actually comply with the standard.
506
Thomas Milo
Overlooking the crucial role of typography in textual computing is another

aspect of the article Computers and the Qurn:
The importance of both Qurnic recitation and calligraphy demonstrates that Muslims accept the presentation of the Qurn in various
media and even recitational requirements such as the taawwu can
be incorporated digitally
Clearly the author refers to recorded human recitation, i.e., sound digitization, not to computer-synthesized voices. However, mentioning calligraphy
in the same sentence with recitation leaves the reader with the impression
that computers render digital Arabic text actually with calligraphic quality.
But this is absolutely untrue: computerized Arabic scripts, i.e., fonts, are to
calligraphy what computer squeaks are to real recitation. When dealing with
Arabic historical orthography and Islamic calligraphy and text manufacture,
the present state of the art font technology and Arabic computing in general
is still more of an obstacle than a tool thus leaving a vacuum that urgently
needs to be filled in.
To summarize, for reproducing any recension of the Qurn, tools can and
must be made:
1. The grapheme inventory of early Arabic needs to be analyzed and
added to the Unicode standard in the form of additional code
points and the protocol for using these characters must be defined more precisely;
2. The script grammar of early Arabic writing needs to be reconstructed meticulously in order to create the required script images.
Such a project creates not a font, but a computer model of an Islamic writing system.
507
SELECTED LITERATURE
Abbott, N., (1939), The Rise of the North Arabic Script and its urnic
Development, Chicago
Blair, S., (2006), Islamic Calligraphy, Edinburgh
Dammen McAuliffe, J., General Editor, (2001), Encyclopaedia of Islam, Volume One A-D, Leiden
Droche, F., (1992), The Abbasid Tradition: Qur'ans of the 8th to the 10th
Centuries AD, Oxford
Droche, F., (2005), Islamic Codicology, an Introduction to the Study of
Manuscripts in Arabic Script, Oxford
Endress, G., (1982), Herkunft und Entwicklung der arabischen Schrift, in:
Grundriss der arabischen Philologie, Band I Sprachwissenschaft
Fendall, R., (2003), Islamic Calligraphy, Sam Fogg Catalogue 27
Flury, S., (1920), Islamische Schriftbnder Amida-Diarbekr
Fraser, M. and Kwiatkowsky, W., (2006), Ink and Gold, Islamic Calligraphy,
Sam Fogg Catalogue, London
Fud Sayyid, A., (1997), al-Kitb al-Arab l-Ma wa Ilm al-Mat
Grohmann, A., (1967-1971) Arabische Palographie, Band I/II
Gruendler, B., (1993),The Development of the Arabic Scripts
um, I., (1969), Dirs f Taawwur al-Kitbt al-Kfiyy, al l-Ar f
Mir f l-Qurn al-ams l-l li l-Hir, Cairo
Jeffery, A., (1938), The foreign Vocabulary of the Qurn, republished Leiden, 2007
Lions, J., (1968), Introduction to General Linguistics, Cambridge
Lling, G, (1974), ber den Ur-Qurn, Erlangen
Milo, T., (1989), Fragments from the Koran, in: Design into Art Drawings
for Architecture and Ornament The Lodewijk Houthakker
Collection, Volume II, London: Philip Wilson Publishers, republished in: Mela Notes, No 62, 1534, as The Koran Fragments of the The Lodewijk Houthakker Collection (1995),
Milo, T., (2002), Authentic Arabic: a Case Study. Right-to-Left Font Structure, Font Design, and Typography, in: Manuscripta Orientalia, 8, No. 1, 4961
508
Thomas Milo
Mitchell, T.F., (1951), Writing Arabic, a Practical Introduction to Ruqah

Script, Oxford
Rezvan, E.A., (2004),The Qurn of Uthmn, St. Petersburg
Safwat, N., (1977), The Harmony of Letters, Islamic Calligraphy from the
Tareq Rajab Museum, Kuwait
Saussure, F. de., (1916) Cours de Linguistique Gnrale. Eds. Charles Bally
and Albert Sechehaye, dition critique prpare par Tullio de
Mauro, Paris
Schimmel, A., (1990), Calligraphy and Islamic Culture, London
Stanley, T., (1996), Introductory Studies to: The Qurn and Calligraphy, a
Selection of Fine Manuscript Material, Bernard Quaritch
Catalogue 1213
Thanoun, Y.,(1986), Old and New in the Origin of Arabic Script and its Development in Various Ages in: Al-Mawrid, a Quarterly Journal of Culture And Heritage, Vol 15, nr 4, Ministry of Culture
and Information, Baghdad
The Unicode Consortium, (2007), The Unicode Standard, Boston
509
ANNEX I
Archigraphemic transliteration scheme for Arabic
Arabic
archigraphemic
Thomas Milo
510
ANNEX II
The archigraphemic transliteration scheme in practice
-
Above let:
Close-up of fragment from an manuscript
A2-15-15. Most of the script grammatical
examples are taken from this manuscript.
Below:
Text with full paedagogical tawd markings,
in the recension of af an im as published
in the 1924 Cairo Qurn (page 366, lines 1-5,
Q17:12-14). he text appearing on the fragment
above is marked in black.
his version is typeset with the computer
model of nas competence in classic Ottoman
performance made by the author as member of
the DecoType team.

$
$* $
4
* :
$ <
@4

* :

F ! GHC-.$I)

01
2356789) ;-=$> ? 235.ABCD#E
"! #%&'()
+,-./

:

:

$*
*
N

J
B PQ U

X \
B
Xi jk
^_
!
`abcZd I*)
e fgYh
]
;5.AC
K? LMO GRTS! W 23Y.Z[
! lCD#%()
!
$

u
v *
J
* J

m
p
$
;
)
n ;
-o
$p%qiLr
s +t
`u'Y.
xw 5.yz9 bZd{
u GHC|! $-./
u $ :
K <

J
$ $* :

*
]!
!Z[N '-. ;-J. $pBC$ 235$.
;

,5J
.$p
) ! $> s ;)
,-

J

J
4 :

F

B
B

*) :
B
J

C$Ci!$( CbZ[
p%(
L!r BC5$.
LM fg5$.
23-. '$bZd
\
Above right:
Archigraphemic transliteration with second-generation
pln spelling with alif marked as a dimmed letter a.
Below:
Fragment of the authors archigraphemic reduction of
the complete Cairo 1924 Qurn, in the af an im
recension. It results in pre-hamz spelling throughout,
i.e., with alif still in its original role of representing glottal
stop, in addition the function of marking tanwn and
plural forms (otiose alef).
a lbl w gelba a bh a lbhr mbcr h lbbbew a fcla mn r bkm

w lbelmw a ed d a lsbbn w a lgsb w kl sy fclbh
bfcbla w kl a bsn a lr mbh tbr h fy ebfh
w bgr g lh bw m a lfbmh kbba blfbh mbsw r a a fr a
kbbk kfy bbfsk a lbw m elbk gsbba mn a hbd y
Note in the 1924 Cairo edition the seemingly random return to Urtext by replacing alif awl with
superscript alif. Ottoman Qurns have alif awl in most cases, whereas the inferred Urtext has none.
Comparison with the Urtext reference model proves that this manuscript fragment is younger than the
austere, archaic script suggests: its spelling uses alif awl even in places where the editors of the 1924 Cairo
Qurn removed it. In fact, in this manuscript fragment only the letter block blfbh is spelled in the archaic,
oldest attested orthography. This text skeleton appears to be identical to that of Ottoman Qurns.
511
NOTES
i
Rippin 2000, internet search argument: rippin +e-texts
ii
Kerning is a technique to allow a letter block to extend into the white

areas above and below an adjacent letter block.
iii
Encyclopaedia of the Qurn, Brill, Leiden 2001
iv
A grapheme is the smallest unambiguous unit in a writing system.

Ideally graphemes correspond to the plain text units of Unicode. In
Arabic most of the graphemes correspond with a phoneme.
With typography, and certainly computer typography, font defects have

influenced Arabic orthography. A case in point is the now widely seen
practice of writing fatatn over alef instead of over the preceding letter
that governs the fatatn. Another example is the disappearance of
hamz without chair a frequently used letter in the Cairo and Medina
Qurn editions that is not available in computer typography. The quick
succession of different font techno-logies and changing encoding
concepts have the unintentional result that different fonts may require
different spellings for obtaining the same printed image. Notably, most
fonts have problems with al-lhu, God. While all contemporary Arab
Qurn editions spell this word with a superscript fat over add,
almost all fonts assume for the theograph a superscript alef over add:
ALEF-FATHA-LAM-LAM-SHADDA-FATHA-
ALEF-FATHA-LAM-LAM-HEH-DAMMA
HEH-DAMMA
correct data structure, wrong image
wrong data structure, wrong vowel

image

For comparison, the correct image representing the above data

structures:
complete vowels
incomplete vowels
Thomas Milo
512
A related phenomenon occurs where font technology does not handle

the combination of ligatures and vowels, forcing the users into
systematically misspelling even key words like the word al- islmu Islam
and the word l no (and part of the word Islam) :
correct data structure,
wrong image
wrong data structure,

approximate image

For comparison, the correct image representing the above data

structures:
complete vowels
incomplete and misplaced vowels
vi
Without diacritic markers, early Arabic orthography becomes multiinterpretable. In this kind of spelling the skeletons are not defective
graphemes, but valid archigraphemes. The majority of historic texts are
written with archigraphemes. Unicode does not yet have the data
structure to deal with archigraphemes and discrete markers as meaningful text elements.
vii
Even the name is problematic. See Nabil Safwat: these early scripts
were not known as Kufic, and indeed were not called Kufic. The city of
Kufa had almost nothing to do with the formation of these scripts and
Thanoon (quoting Yousuf Thanoun, Old and New in the Origin of Arabic Script and its Development in Various Ages in Al-Mawrid, a Quarterly Journal of Culture And Heritage, Vol 15, nr 4, Ministry of Culture
and Information, Baghdad 1986) argued that the term Kufic betrayed
dated knowledge (italics by TM) of Islamic calligraphy. (Nabil Safwat,
513
The Harmony of Letters, Islamic Calligraphy from the Tareq Rajab Museum, Kuwait 1977).
viii
Private communication of Gerd-Rdiger Puin.
ix
As for Arab sources, Schimmel 1990 on page 3, writes The incoherent

statements found in Arabic and Persian sources are difficult to entangle.
Gnter Lling 1974, page 381, bases a key argument on the unfounded
claim that the unpointed and therefore archigraphemic letter blocks
(rasm) underlying the Arabic words tis nine and sab seven are
exactly identical. This routine assumption has never been proven and
contradicts the findings of this essay. It is therefore interesting that the
opponents of the resulting radically different reading of Q74:30 (alayh
sabta auri-n on it seven gates[of hell] instead of alayh tista
aara over it [are] nineteen [guardian angels] never pointed out that
no manuscript evidence exists in support of this theory. All manuscripts
meticulously execute Arabic script grammar to disambiguate such text
skeletons. On the other hand, proponents of this approach overlook the
implications of Llings argument: that the ambiguity apparently must
have existed in a much earlier, as yet unattested phase of the emerging
text when this aspect of Arabic scrip grammar did not yet exist.
xi
Page 26 of Abbott 1939 discusses the letter block LLH in general terms of
the underlying script grammar, but she calls it the treatment of the word
Allh. On the other hand, in the Qurn the unconnected letter block
LLH occurs exclusively as part of the spelling of al-lhu. However, the letter block LLH does not behave differently than other, enclosed groups of
LLH occurring in this text such as, e.g., LCLLH /-allta/, BCLLH
/yulilhu/.
xii
De Saussure 1916, page 30.
xiii
See for instance John Lions 1968: Although a particular alphabet or a

particular syllabary may be more suitable for certain languages than for
others, there is no correlation between the general structure of different
514
Thomas Milo
spoken languages and the type of writing-system used to represent

them.
xiv
Mitchell 1951, page 2: It is a curious fact that students of Arabic have in

the past strangely neglected those elements of grammar without which
there would be no grammar, viz. the letters. The infrequency with which
one encounters European scholars having a knowledge of the Arabic
script has often been observed, but we may go further and say that the
number of those who write Arabic in an acceptable manner is remarkably small.
xv
Droche 1992, page 16.
xvi
Page 26 of Nabia Abbott 1939 mentions multitudinous and complex

rules regarding maq, the stretching of the connecting stroke. This observation is underscored with sample rules that are a valuable contribution to our knowledge. But in order to get a complete picture, such rules
as provided by this publication need to be supplemented by all the other
rules that make up the script grammar in question.
xvii
Droche 1992, page 16 writes: the descriptions should merely draw the
readers attention to the salient features of the script.
xviii
The script tables in Droche 1992, e.g., on page 38, structurally omit kf
and dl.
xix
Also modern calligraphers tackling kfi can overlook this positional

distinctive feature. In his analysis below, the kf is correctly encircled in
R BK (rabbika) and BSBKBR W N (yustakbirna), however in the last example erroneously a dl is identified as a kf in EBA D BH (ibdatihi):
515
(Taken from Arabic Calligraphy Instruction, The letter Kaf in Kufi scripts.
http://www.sakkal.com/instrctn/Kaf01.gif
http://www.sakkal.com/instrctn/Kufi_Kaf.html)
A possible cause for this confusion is that in the second generation
scripts, besides the modern shortened kf, a stretched kf is available as a
calligraphic alternative. This surviving kfi kf is based on the shape
shared by non-final kf and dl in early Arabic, but since it no longer
needs a contrastive opposition with dl, the use of the vertical bar is not
known. The apparent reluctance among later calligraphers to use the
stretched kf in final position may be related to this. Like the theograph,
stretched kf, too, can be considered a kufism.
xx
An Arabic font is an industrial product designed to enable handling

Arabic with technology that is not designed for Arabic. In the design
process, the structure and appearance of Arabic script and orthography
can be changed for technical and esthetic reasons. The resulting font is a
cultural innovation:

xxi
Arabic script synthesis is a scientific method to analyze and synthesize

traditional calligraphic styles and time-proven typesetting systems. In
this approach the integrity of Arabic script needs to be preserved when it
is reproduced in digital form in order to verify the accuracy of the
analysis. The result is not a font but a computer model of a script:

Computing and The Qur Ān

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Computing and The Qur Ān

Transféré par

Droits d'auteur :

Formats disponibles

Computing and the Qurn

1. Introduction - Arabic Input and Output Assessed

Computing and the Qurn

temporary Qurniv. This inventory is larger and typographically far more

Arabic script there would be no ligature lm-alif, but separate codes lm

2. Type Design versus Script Analysis

Computing and the Qurn

raised middle form of B

In certain styles S takes the form of descending triplets:

raised initial form of B

raised initial form of B

Unlike in modern Arabic calligraphy and typography, in the pre-mansb

first element of twin vertical letters is

Here is an example of the implications of the sloping twin pattern:

distinctive feature: sloping twins

normal initial form,

Computing and the Qurn

It is the dissimilation rule that disambiguates S and BBB:

opposition BBB:S by sloping twin

It is the sloping twins that disambiguate BBS and SBB:

dissimilation from L and S

distinctive feature: sloping twins

Another example of dissimilation from S by sloping twin in -SBBmiddle position

Computing and the Qurn

In early script grammar, of which this dissimilation system is an aspect, also

3. Approaching Arabic Script with Linguistic Concepts

Computing and the Qurn

As a working hypothesis we have decided to consider each cluster of

distinctive feature: position

distinctive feature: shape

Computing and the Qurn

Overlooking the crucial role of typography in textual computing is another

Computing and the Qurn

Mitchell, T.F., (1951), Writing Arabic, a Practical Introduction to Ruqah

Computing and the Qurn

a lbl w gelba a bh a lbhr mbcr h lbbbew a fcla mn r bkm

Computing and the Qurn

Rippin 2000, internet search argument: rippin +e-texts

Kerning is a technique to allow a letter block to extend into the white

Encyclopaedia of the Qurn, Brill, Leiden 2001

A grapheme is the smallest unambiguous unit in a writing system.

With typography, and certainly computer typography, font defects have

correct data structure, wrong image

wrong data structure, wrong vowel

For comparison, the correct image representing the above data

A related phenomenon occurs where font technology does not handle

wrong data structure,

For comparison, the correct image representing the above data

incomplete and misplaced vowels

Computing and the Qurn

Private communication of Gerd-Rdiger Puin.

As for Arab sources, Schimmel 1990 on page 3, writes The incoherent

De Saussure 1916, page 30.

See for instance John Lions 1968: Although a particular alphabet or a

spoken languages and the type of writing-system used to represent

Mitchell 1951, page 2: It is a curious fact that students of Arabic have in

Droche 1992, page 16.

Page 26 of Nabia Abbott 1939 mentions multitudinous and complex

Also modern calligraphers tackling kfi can overlook this positional

Computing and the Qurn

An Arabic font is an industrial product designed to enable handling

Arabic script synthesis is a scientific method to analyze and synthesize

Vous aimerez peut-être aussi