Académique Documents
Professionnel Documents
Culture Documents
Received on March 7, 2003; revised on June 5, 2003; accepted on June 17, 2003
120 Bioinformatics 20(1) © Oxford University Press 2004; all rights reserved.
GIS: a biomedical text mining system
ncku.edu.tw/~yuhc/gis/figure.htm]. The relations here are related genes, instead of showing only a list of titles. (3) Using
classified into three categories, positive, cooperative and neg- colors to mark important keywords, to help the user discover
ative, according to gene function. For example, a gene A can important information easily and to facilitate understanding.
activate another gene B’s function or expression (a posit- (4) Editing the domain-specific lexicon of biological func-
ive relation); gene A and gene B can bind to a complex or tions, diseases and genes to meet the user’s needs, and thus to
cooperate with each other (a cooperative relation and example expand the lexicon. (5) Filtering the content of the abstracts
keywords are ‘associate with’ and ‘is conjugated to’); or gene and reserving only the conclusions or last three sentences of
A can suppress gene B’s function or expression (a negative the abstract.
relation). We have presented here a biomedical text mining system
The function of this module is achieved through four agents: that screens gene information and extracts gene–gene relations
document retrieval, data preprocess, learning process and described in the text. We extract the information about biolo-
relation prediction. Here we introduce only the kernel learn- gical functions, associated diseases and related genes through
ing process and relation prediction agents. The learning a domain-specific lexicon. We use three kinds of relations—
process agent is responsible for generating sentence expres- positive, cooperative and negative—to represent the relation
sion patterns from training samples consisting of sentences between a pair of genes. The extraction performance is 0.840
describing gene–gene relations. Sentence expression pat- for precision and 0.767 for recall. In brief, we have designed
terns stand for the patterns of wording and term distribution and implemented new system architecture for the discovery
in describing relations, and they are represented as a vari- of gene information in biomedical texts.
ant of decision tree. This agent operates offline beforehand.
The relation prediction agent judges the relations described REFERENCES
in sentences according to sentence expression patterns and
121