Académique Documents
Professionnel Documents
Culture Documents
discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/50996121
Article
Source: OAI
CITATIONS READS
0 58
3 authors, including:
Fadi Zaraket
American University of Beirut
48 PUBLICATIONS 120 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Fadi Zaraket on 05 December 2016.
256
n3 n2 n1 n
®Ë@
¨A®ª áK
.
èPAÔ« QK
Qk.
á« AJ KYg YJ
ª áK . éJ .
J¯ AJ KYg ø
PAjJ . Ë@ h. Qk @ ù
®ºË@ ú
æ.Ë@ éÓ Q. áK . é<Ë@ YJ.« ú
k @ áK . éÓ Q. áK . ¨A®ª
®Ë@
áK èPAÔ«
.
s2 s
1 ©ñÓ Q
«ð H X B@
èCË@ ú¯
n5 P úG
@ n4 á« XAK
P áK . Yg@ñË@ YJ.« ð ø
PñJË@ á« . ð ø
PA
ªÖÏ@ð AK
AñË@ð
éJ« é<Ë@ ú
æP èQK Që
ú
G. @ á« é«P
.
éÓ Q. áK .
s4 s3
ÈA¯ Õæ
ªK ú
G. @ áK . áÔ gQË@ YJ.« ð ðQÔ« áK . ñë é«P P úG
@ á« éJ« ÉJ
¯ áK . @ ð QK
Qk. ð
é<Ë@ ÈñP AK
ÈA® ¯ ÕÎð é
Ê« é<Ë@ úÎ é<Ë@ ÈñP úÍ@ Ég. P ZAg. ÈA¯
.
p2 p1 q
Õç ÈA¯ ½Ó @ Õç ÈA¯ áÓ Õç ÈA¯ ½Ó
@ ÈA¯ úæK . Am ám
. AJË@ k
@ áÓ
... IK
YmÌ @ lÌ A ñë Õç Ag ñK.
@
á Ó matn content . ¼ñK
@ Õç ÈA¯
áÓ Õç ÈA¯ ½Ó
@ Õç ÈA¯ áÓ ,mārt bin al-qa,qā, bin šbrmt bin -hyi ,bd al-lah bin šbrmt ald.biy
. ˘
h.dtnā qtybt bn s ,yd h.dtnā ǧryr ,n ,mārt bn ālq ,qā , bn šbrmt ,n -by zr ,t ,n alkofiy -hraǧ albhāriy fy als.alāt w-al-was
. āyā w-al-maġāzy w-al-
¯ ¯ ˘
-adab wgayr mawd˘ i, ,an al-tawriy w ,abd al-wa’hid bin ziyad w
-by hryrt rd.y al-lāh ,nh qāl ǧā - rǧl -lā rswl al-lāh s.lā ’l-lāh ,lyh wslm . ¯ .
fqāl yā rswl al-lāh mn -h.q ālnās bh.sn s.h.ābty qāl -mk qāl tm mn qāl ǧrayr w ā’bn fod.ayl ,anho ,an -aby zar,t howa bin ,amrw wa ,bd
tm mk qāl tm mn qāl tm -mk qāl tm mn qāl tm bwk
¯ al-rah.man bin -abyi na,yim qāl -abw h.ā’tim howa s.ālih.o...
¯ ¯ ¯ ¯ ¯ [He is] Oumara bin Alqaaqa bin Shoubroma the son of the brother
Qotayba bin Said narrated to us, Jourayr narrated to us, from Oumara bin Alqaaqa
bin Shoubroma from Abi Zoraa from Abi Houraira may God be content with him,
of Abdallah bin Shoubroma the Doubian the Kafian[.] Alboukhari
he said a man came to the apostle of God, praise and peace of God be upon him, related on prayer, recommendations/wills, combats, literature and
and he said Oh apostle of God who amongst people is more worth of my good other topics, from Althawriy, AbedAlwahed bin Ziyad, Jourayr
company he (the prophet) said your mother, he said (the man) then who, he said (the and Ibn Foudayl from him [Oumara] from Abi Zoraa bin Amro
prophet) then your mother, he said (the man) then who, he said (the prophet) then and AbedAlrahman bin AbiNaim[;] Abou Hatim said his narra-
your mother, he said (the man) then who, he said (the prophet) then your father.
tions are valid...
the birth place, the lifespan, and the credibility of the narra- that lacks authenticity for jurisprudence. The authenticity of
tor. It also contains references to his or her professors and a hadith depends on the credibility of the narrators as re-
students, e.g. p1 and p2 , and s1 , s2 , s3 and s4 from Fig- ported in separate biography books. Figure 1(c) provides a
ure 1(b), respectively. Notice that s3 is n2 from Figure 1(a), graphical illustration of the process of annotating narrator n3
p1 is n4 , and n is n3 . from the hadith in Figure 1(a) with credibility information q
Our contributions. We considered books of unseg- extracted from the biography in Figure 1(b).
mented hadith text to be H, and books of unsegmented nar- Hadith authentication is currently manual and error prone
rator biographies to be B. For each hadith book h ∈ H, we due to the huge number of existing narration books. Al-
computed morphological features and part of speech (POS) Azami(1991) cites more than eleven books of digitized nar-
tags. We considered the structure of a hadith as a rhetori- ration books each of several volumes, and a dozen other bi-
cal relation and encoded it using finite state machines that ography and secondary authentication books.
take the computed features from h and produce segmented Arabic morphological analysis. Arabic morphologi-
hadith documents each with an internal structure similar to cal analysis is key to the analysis of Arabic text (Shaalan,
that in Figure 1(a). xWe used the same technique for books Magdy, and Fahmy 2010). Current morphological analyz-
in B. (1) We obtained excellent results for H and less accu- ers (Al-Sughaiyer and Al-Kharashi 2004; Buckwalter 2002)
rate results for B. We considered the computed structure of consider the internal internal structure of an Arabic word and
each hadith, and (2) used a novel hierarchical Arabic name compose it into several concatenated morphemes, i.e. stems
distance metric to identify identical narrators, and (3) ag-
gregated the structures extracted from H by merging identi-
An affix can be a prefix, suffix, or an infix. The
and affixes.
word èYÔg @ -ah.madh may have two valid morphological
cal narrators to build a narrator graph G. (4) We used G to
boost entity and relation extraction in B using graph algo- analyses. The letter @ -a may be a prefix and the word means
rithms and obtained significantly better results. In this paper
we stress the cross-document aspect of our work where we “I praise him”, or The letter @ -a may also be part of the
use cross-document NLP to improve entity and relation ex-
traction from hadith and biography documents.
stem YÔg @ -ah.mad (a proper noun) and the word means “his
Ahmad”. The morphological analysis of AJ KYg h.dtnā “nar-
2 Background ¯
rated to us”, the word that starts the hadith in Figure 1(a),
returns the stem HYg h.dt “narrated”, which is also the
In this section we review the importance of the information
retrieval task form hadith and biography texts, Arabic mor- stem of other words such as¯ úæ KYg h.dtny “narrated to me”
h.dthm “narrated to
them”. ¯The stem also shares
and ÑîEYg
phological analysis, and graph transformation techniques. ¯ meaning gloss tags with words such as ÈA¯
Importance of hadith authentication. The collections similar POS and
of narrations
are the second source of jurisprudence after the
qāl “said” and Q.g @ -ahbar “told”.
à @Q¯ qor-ān for all Islamic schools of thought. Due to reli- ˘
gious and political reasons, writing the narrations was for- Graph algorithms. We use the following algorithms.
bidden until the days of the eighth Umayad Calif, áK . Q Ô«
• Merge: takes two nodes n1 and n2 , removes them from
QK
Q ª Ë@ Y J. « ,mr bn ,bd al,zyz(717-720 AC), seventy or the graph and adds a node m(n1 , n2 ) that has the edges
so years after the death of the prophet (Askari 1984). Con-
of n1 and n2 except the induced self edges.
sequently, several inconsistencies were introduced to the lit-
erature which necessitated a thorough authentication study • Split: takes a merged graph node m(n1 , . . . , nk ) that
of a narration before its use in jurisprudence. While different was formed by merging several nodes, and splits m into
Islamic schools of thought differ on how to interpret the con- m1 (n1 , . . . , ni ) and m2 (ni+1 , . . . , nk ) so that a threshold
tent, they almost all agree not to use narrations with a sanad is met on the distance between the nodes in m1 and m2 .
257
Figure 3: The sequence of narrator extractor diagram.
distance(Narrator n1 , Narrator n2 )
258
buildGraph(Narrator C[][])
Hash Hn ={}
foreach sequence c in C
foreach Narrator n in c
Narrator nh , nb
double sh , sb =0 //scores
Figure 4: Hash keys and corresponding relevance scores. foreach k in n.keys()
qualifying components of n, such as ú¯@QªË@ āl,rāqy (the NarrScorePair matches[] = Hn .findAndInsert(n,k)
Iraqi) in the name ú¯@QªË@ éÊË@ YJ.«, in p.
foreach (nh ,sh ) in matches
if (|nh .index-n.index| < r)
Then distance counts the matching name components if (sh × key.score() > sb )
and the matching credibility qualifying components. It re- if (distance (n, nh ) < )
turns ∞ if the morphological features of two name compo- sb =sh , nb =nh
nents at the same level do not match. If only one of the name if (sb ≥ θ) mergeNodes(n, nh )
components at the same level is empty such as áK . éÊË@ YJ
J.«
QÔ« áK . éÊË@ YJ.« and QÔ« áK . éÊË@ YJ. « áK . , the metric skips
it. For the qualifying components, distance uses POS and that includes the name subject narrator n of the biography
gloss tags to check whether two qualifiers conflict with each and the narrator names of the professor and student of n.
other, such as ú¯@QªË@ āl,rāqy (the Iraqi) and ø QåÖÏ @ ā-
The first approach of ANGE works similarly to Figure 3 and
lms.ry (the Egyptian) and returns ∞. It also counts the num- extracts narrator names, professor, and student features, and
ber of qualifying components that reinforce each other such biographies from the biography text books without the use of
as ú¯@QªË@ āl,rāqy and ú¯ñºË@ ālkwfy (The Kokian) where
the narrator names and the narrator graph extracted from the
Kofa is a city in Iraq. Finally the metric returns the percent- hadith books. Then, ANGE applies cross-document recon-
age morphological and structural differing components. ciliation and uses the narrator names and the narrator graph
The naive brute force method to detect all identical narra- extracted from the hadith books and improves accuracy of
tors in thousands of sequences extracted from hadith books entity and relation extraction from biographies significantly.
renders the task of building a complete narrator graph in-
tractable. Instead, ANGE uses a multi-key hash technique For each narrator ni in lb , 1 ≤ i ≤ |lb |,
that associates every narrator name with the several narrator ANGE uses the multi-key hash technique and lo-
names that can be built from its components. For example, cates ni in the narrator graph. ANGE uses the k-
èPA Ô« ,omārat
the narrator name ú¯ñ º Ë@ ¨A ® ª ® Ë@ áK reachable graph algorithm and checks whether subsequent
. narrators,(. . . , ni−1 , ni , ni+1 , . . .) in lb are reachable within
bin al-qa,qā, al-kowfiy is composed of two names, i.e. èPAÔ«
and ¨A®ª®Ë@, one name connector, i.e. áK., and one possessive k steps from each other. If not, then ANGE decides that it
qualifier, i.e. ú¯ñºË@. Six other structures that refer to the crossed a biography boundary. ANGE inspects the cluster of
same narrator can be generated out of those components as narrators reachable within k steps and selects the center of
shown in Figure 4. ANGE considers each of the structures as the cluster nc as the subject narrator of the biography. The
a hash key valid to match the original narrator and associate parents and the children of nc in the narrator graph are the
a match score with it. professors and the students of nc , respectively. The colored
The buildGraph Algorithm takes as input a list of narra- nodes in Figure 2 illustrate this process.
tor sequences C. It iterates over all narrators n and iterates Ambiguity happens in case of several matches for ni
over all the hash keys of n. For each hash key k in n.keys, in the narrator graph. ANGE resolves the ambiguity us-
the algorithm looks up the narrator and score pair matches ing two measures. First ANGE uses a threshold against the
of k in the hash table Hn . For each matching pair (nh , sh ), hash score to limit the matches. Also, ANGE only consid-
the algorithm checks whether n and nh are of the same gen- ers matches of ni that are within k steps of the matches of
eration by checking their corresponding indices in their se- narrator ni−1 preceding ni in lb .
quences against a radius parameter r. This refines the search
to avoid checking a narrator relating directly to the prophet
against a narrator separated from the prophet by at least five 4.1 Partial narrator graph annotation
narrators. If the pair (nh , sh ) matches n better than the previ-
ously matching pair (nb , sb ), then nh is checked for equality In most cases, a scholar is interested in only annotating
against n to protect from hash collisions before it is selected a partial narrator graph Gp that he or she extracted using
as the best match. Finally, if the score of the best match sb ANGE from a selected set of hadith of interest, e.g. the
is bigger than a parameter θ, n and nh are merged. set may contain narrations related in topic. For each nar-
rator n in Gp , ANGE computes Ns = {n1s , n2s , . . .} and
4 Biography detection Np = {n1p , n2p , . . .} the children and parents of n in Gp . We
ANGE computes lb , the list of detected narrators in the bi- compute Nb the intersection of {n} ∪ Ns ∪ Np and lb , and
ography books using similar techniques used in the hadith sorts Nb by the order of appearance in the biography books.
books, and orders lb by the order of appearance of the narra- ANGE considers the clusters in Nb that contain n and that
tors in the biography book. ANGE then considers the rhetor- happen in the same text locality as the candidate biographies
ical relation that characterizes the structure of a biography of n and ranks them in terms of the number of matches.
259
5 Related work narrators recall precision F-score
detected 0.673 0.824 0.741
all 0.632 0.771 0.694
Hadith analysis. The iTree (Azmi and Bin Badia 2010) tool
extracts narrator sequences from manually segmented narra- Table 1: Narrator graph accuracy results.
tions. It uses a context free grammar (CFG) to extract nar-
Narrator detection Narrator boundary
rators and uses the Levenshtein distance metric to compare recall precision F-score recall precision F-score
narrator names. ANGE differs in that it targets the harder No-Cross-doc 0.94 0.81 0.87 0.41 0.95 0.57
Cross-doc 0.65 0.91 0.76 0.93 0.96 0.94
problem of segmenting hadith books, does not use a CFG,
and instead uses morphological features and FSMs to de-
Table 2: Narrator detection in biographies.
tect the narrator names, and uses a novel morphological and
structural distance metric to check narrator name equality.
Biography detection Biography boundary
recall precision F-score recall precision F-score
Arabic named entity detection. Named entity detection all 0.13 0.72 0.21 0.09 0.79 0.16
techniques for Arabic exist and use local grammars (Shaalan interesting 0.71 0.84 0.77 0.47 0.87 0.61
and Raza 2009; Zaghouani et al. 2010; Traboulsi 2009) in-graph 0.80 0.89 0.84 NA NA NA
along with other boosting techniques to detect named enti-
ties. Our cross document integration and reconciliation tech- Table 3: Biography detection results.
nique can work directly with the named entities extracted 6 Results
with such techniques.
We evaluated ANGE using three hadith books with several
NERA uses rule-based systems with local gram- volumes each (Ibn Hanbal 2005; Al Kulayni 1996; Al Tousi
mars (Shaalan and Raza 2009). ANERSys boosts the ca- 1995), and two biography books (Al-Kassem Al-Khoei ;
pabilities of its statistical models to detect named enti- Al-Waleed Al-Baji ) that we obtained from online sources.
ties by using task specific corpora and gazetteers and ig- We report recall and precision as our evaluation metrics. In-
nores POS tags (Benajiba, Rosso, and Benedı́ruiz 2007; formally, recall measures whether we detected all correct en-
Benajiba, Diab, and Rosso 2008). It also uses morphosyn- tities; while precision measures whether we did so without
tactic features and a parallel Arabic and English corpora to introducing false positives.
bootstrap noisy features (Benajiba et al. 2010). The tech- The results in Table 1 report the accuracy results for the
niques in (Zaghouani et al. 2010; Al-Jumaily et al. 2011; narrator graph extraction from the hadith books. For each
Maloney and Niv 1998) use light morphological stemming narrator node n, we computed the ratio of the correctly
for Arabic with local grammars developed for Latin lan- merged nodes against the number of nodes with which n
guages, as well as pattern matching. ANGE differs from should be merged. The precision metric denotes the average
those techniques in that it computes features based on mor- of the ratio of correctly merged nodes against the number of
phology, POS and gloss tags assigned to all morphemes and merged nodes for each narrator node.
(not only to the stems) and passes the feature vectors to man- The “detected” row includes only the narrator nodes de-
ually built finite state machines that detect the target struc- tected by ANGE and indicates the accuracy of the graph
tures. ANGE is not restriced by stop words (Abuleil 2004), building routine isolated from the rest of the method and ex-
or by the expressive power of a local grammar. It does not cludes the narrators that ANGE failed to extract. The “all”
enumerate all the structures that express the target entities. row includes all narrators and sanads in the text. The recall
Cross-document NLP. Similar to (Zhang, Otterbacher, rate was 63% mainly due to the conservative cycle breaking
and Radev 2003) we assume structural relations exist transformation that splits merged narrator nodes even if they
amongst entities in a document. ANGE differs in that it ex- were equivalent. ANGE achieved 82% and 77% precision
tracts the documents from a set of large texts using the as- rates. Upon examination, we discovered that the precision
sumed relations. Then ANGE uses the extracted documents should have been higher due to a conservative decision by
and structures to improve accuracy. ANGE also differs in the manual annotator to keep several confusing nodes split
that it considers two distinct types of documents each with while ANGE correctly merged them. Further studies should
a distinct structure. The work in (Bhole et al. 2007) extracts accomodate several manual annotators and report on inter-
named entities and relations and presents them as graphs an- annotation agreement.
notated with extracted temporal entities from the same doc- Table 2 shows the accuracy results for narrator detection
uments. ANGE differs in that it extracts graphs from a class in biography books computed via manually inspecting a se-
of documents and annotates the extracted graph with entities lection of 1,950 narrator names. Without the use of the nar-
it extracts from another class of documents. ANGE uses a rator graph extracted from the hadith documents, ANGE
structural and morphological metric to compute identical en- detected 94% of the narrators mentioned in the biography
tities instead of the Levenshtein distance used in (Finin et al. books with 81% precision, and detected 40% of the bound-
2009) to evaluate cross-document coreference in Wikipedia. aries of the extracted narrators with 94.5% precision.
In (Ravin and Kazi 1999) English names are coreferenced With the use of the narrator graph, ANGE coreferenced
using split and merge heuristics based on the ambiguity level 65% of the narrator names in the biography books with 91%
of the names after contextual analysis. ANGE differs in that precision. The rest of the biography narrator names are not
it uses graph algorithms and considers structural contextual found in the hadith books that we used to generate the nar-
features specific to the subject documents. rator graph and are therefore not interesting to the narra-
260
tor graph annotation task. ANGE improved the recall of the Al Tousi, M. b. H. 1995. Al Istibsar. Taaruf.
boundaries of the detected narrators from 40% to 93% and Al-Waleed Al-Baji, A. Al-taadeel wa al-tajreeh liman akhraj ’nh
improved the precision also from 94.5% to 96%. This shows al-Bokhari (The credibility and the scrutinized credibilty for the
the significant impact of cross-document reconciliation on men of Bokhari — in Arabic).
the accuracy of named entity and relation extraction. Askari, M. A. 1984. Maalim al Madrasatayn, volume 2. Bitha.
We consider a biography interesting when it contains at Azami, M. M. A. 1991. A note on work in progress on computeri-
least two student/professor names. We computed the results zation of hadith. Journal of Islamic studies 2(1).
in Table 3 via inspecting a selection of 150 biographies. The Azmi, A., and Bin Badia, N. 2010. iTree - automating the construc-
recall figures are low when inspecting all biographies. The tion of the narration tree of hadiths. In Natural Language Process-
recall for biography detection improves to 71% from 13% ing and Knowledge Engineering.
when inspecting the interesting biographies. The recall for Benajiba, Y.; Zitouni, I.; Diab, M. T.; and Rosso, P. 2010. Arabic
biography boundary detection improves to 47% from 9%. named entity recognition: Using features extracted from noisy data.
Precision also improves significantly. The results should im- In ACL (Short Papers), 281–285.
prove when we include more hadith books and build more Benajiba, Y.; Diab, M.; and Rosso, P. 2008. Arabic named entity
complete narrator graphs. This is evident when we inspect recognition using optimized feature sets. In Empirical Methods in
interesting biographies whose subject narrators exist in the Natural Language Processing, 284–293.
narrator graph extracted from hadith books. Benajiba, Y.; Rosso, P.; and Benedı́ruiz, J. 2007. ANERsys: An
We considered sets of at most ten hadith documents re- Arabic named entity recognition system based on maximum en-
lated by content and extracted their narrator graphs using tropy. 143–153.
ANGE. We then used the partial narrator graph annotation Bhole, A.; Fortuna, B.; Grobelnik, M.; and Mladenic, D. 2007.
approach to annotate the resulting graphs. ANGE annotated Extracting named entities and relating them over time based on
the nodes of the partial graphs with 80% recall and 89% wikipedia. Informatica (Slovenia) 31(4).
precision. This is again evidence that cross-document NLP Buckwalter, T. 2002. Buckwalter Arabic morphological analyzer
improves entity and relation extraction results significantly. version 1.0. Technical report, LDC catalog number LDC2002L49.
The accuracy of biography boundary detection is not well Finin, T.; Syed, Z.; Mayfield, J.; McNamee, P.; and Piatko, C. 2009.
defined in this task since the partial graph annotation method Using Wikitology for Cross-Document Entity Coreference Resolu-
reports several biographies ranked with a similarity metric. tion. In AAAI Symposium on Learning by Reading and Learning to
Acknowledgements. We would like to thank Marwan Read. AAAI Press.
Zeineddine, Hassen Al-Sadr, and Hussein El-Asadi from Ibn Hanbal, A. .-. B. 2005. Musnad. Noor Foundation.
the Hadithopedia team for technical discussions, and discus- Maloney, J., and Niv, M. 1998. TAGARAB: A fast accurate Arabic
sions on the hadith application. name recognizer using high-precision morphological analysis. In
Workshop on Computational Approaches to Semitic Languages.
7 Conclusion and future work Ravin, Y., and Kazi, Z. 1999. Is hillary rodham clinton the pres-
We presented ANGE, an annotated narrator graph extraction ident? disambiguating names across documents. In PROCEED-
INGS OF THE ACL ’99 WORKSHOP ON COREFERENCE AND
technique from hadith and biography books that uses mor- ITS APPLICATIONS, 9–16.
phological features, finite state machines, graph algorithms
and cross-document reconciliation and integration. The use Shaalan, K. F., and Raza, H. 2009. NERA: Named entity recogni-
tion for Arabic. JASIST 60(8).
of cross-document reconciliation significantly improved ac-
curacy results on named entity and relation extraction tasks. Shaalan, K. F.; Magdy, M.; and Fahmy, A. 2010. Morphological
analysis of ill-formed arabic verbs in intelligent language tutoring
Up to our knowledge, ANGE is the first tool that uses cross- framework. In Applied Natural Language Processing, Florida Ar-
document NLP to improve accuracy results in Arabic NLP tificial Intelligence Research Society Conference. AAAI Press.
tasks. We plan to use cross-document NLP to learn fea-
Sipos, R.; Bhole, A.; Fortuna, B.; Grobelnik, M.; and Mladenic, D.
ture vectors that characterize relations such as studentship 2009. Historyviz - visualizing events and relations extracted from
that was hard to detect in the biography documents with- wikipedia. In European Semantic Web Conference, volume 5554
out cross-document NLP. We also plan to extend our cross- of Lecture Notes in Computer Science.
document approach to other applications. Stephanie Strassel, Mark Przybocki, K. P. Z. S., and Maeda, K.
2008. Linguistic resources and evaluation techniques for evaluation
References of cross-document automatic content extraction. In International
Abuleil, S. 2004. Extracting names from Arabic text for question- Conference on Language Resources and Evaluation.
answering systems. In Recherche d’Information et ses Applications Traboulsi, H. 2009. Arabic named entity extraction: A local
(RIAO), 638–647. grammar-based approach. In International Multiconference on
Al-Jumaily, H.; Martnez, P.; Martnez-Fernndez, J.; and Van der Computer Science and Information Technology.
Goot, E. 2011. A real time named entity recognition system for Zaghouani, W.; Pouliquen, B.; Ebrahim, M.; and Steinberger, R.
Arabic text mining. Language Resources and Evaluation 1–21. 2010. Adapting a resource-light highly multilingual named entity
Al-Kassem Al-Khoei, A. Moajam Rijal Al hadith (Encyclopedia of recognition system to Arabic. In Language Resources and Evalu-
hadith narrator biographies — in Arabic). ation Conference.
Al Kulayni, M. i. Y. 1996. Kitab al-Kafi. Taaruf. Zhang, Z.; Otterbacher, J.; and Radev, D. R. 2003. Learning cross-
Al-Sughaiyer, I. A., and Al-Kharashi, I. A. 2004. Arabic mor- document structural relationships using boosting. In ACM Inter-
phological analysis techniques: a comprehensive survey. American national Conference on Information and Knowledge Management,
Society for Information Science and Technology 55(3):189–213. 124–130.
261