Sidi Mohammed Ben Abdellah Faculty of Science Dhar El University Mourad SARROUTI and Said OUATIK EL ALAOUI Mahraz Laboratory of Computer Science and Modeling, FSDM, Sidi Mohamed Ben Abdellah University, Fez, Morocco. mourad.sarrouti@usmba.ac.ma, s_ouatik@yahoo.com
Abstract Experimental Results and Discussion
Biomedical document retrieval systems play a vital role in biomedical question Dataset answering systems. The performance of the latter depends directly on the As a test collection, we used the publicly available benchmark datasets provided performance of its biomedical document retrieval section. Indeed, the main goal of by the BioASQ challenge [5]. The latter, within the 2014 edition, realised five biomedical document retrieval is to find a set of citations that have high probability batches of testing data which are used as test sets to evaluate the participating to contain the answers. In this paper, we propose a biomedical document retrieval systems in the task b. Each batch of testing data sets contains 100 biomedical method to retrieve relevant documents for the biomedical questions (queries) from questions. The biomedical questions are created by a group of biomedical experts the users. In our framework, we first use GoPubMed search engine to find the top- and provided by BioASQ organizers. K results. Then, we re-rank the top-K results by computing the semantic similarity between questions and the title of each document using UMLS similarity. Our Evaluation Metrics proposed method is evaluated on the BioASQ 2014 task datasets. The experimental We evaluated the performance of the proposed biomedical document retrieval results show that our proposed method has the best performance (MAP@100) method using four metrics, namely mean precision, mean recall, mean F1-measure compared to the existing state-of-the-art related document retrieval systems. and mean average precision (MAP) [5]. In the BioASQ challenge, MAP measure is used to sort and to compare the participating systems. Moreover, for the test in BioASQ’2014, only the 100 first documents from the resulting list are allowed to Introduction be submitted. By the rapidly increasing of knowledge in the biomedical domain, it becomes very Results and Discussion difficult even for experts to absorb all the relevant information in their field of Table 1 presents the comparison between our proposed method and the current interest. Information Retrieval (IR) systems present a list of documents that might state-of-the-art methods on batch 1 of testing datasets in BioASQ 2014. have the associated information, but the majority of them leave it to the user to find and extract the required information. Unlike IR systems, Question Answering (QA) Table 1 Comparison in terms of MAP of the proposed biomedical document retrieval method systems aim to provide inquirers with direct and precise answers to their questions, with the current state-of-the-art methods. by employing Information Extraction (IE) and Natural Language Processing (NLP) methods [1]. Typically an automated QA system consists of three main elements, Systems Mean precision Mean recall Mean F-measure MAP which independently can be studied and developed, [1, 2]: Question Processing, Document Processing and Answer Processing. Figure 1 illustrates the generic SNUMedinfo1 0.04 0.59 0.08 0.26 architecture of a biomedical QA system. Top 100 baseline 0.22 0.43 0.22 0.19 Generic Architecture of Biomedical Question Answering Systems
Proposed System 0.23 0.36 0.22 0.27
Phase 1 Question Processing Query Overall, from Table 1, it can be seen clearly that the results of the proposed method have an absolute competitiveness with the current state-of-the-art methods Natural Language Query Formulation PubMed Documents in terms of MAP. Indeed, the performance of our system was 0.27 of MAP. Questions (e.g. “Is Moreover, Our proposed method significantly outperforms the baseline system, CHEK2 involved in Relevant cell cycle control?”) documents i.e., Top 100 Baseline, by a wide margin in term of mean average precision User Question Analysis Phase 2 (0.0847 MAP). & Classification Document Processing Conclusion and Future Work In this paper, we have tackled an original biomedical document retrieval method. First, we have used Metamap to extract biomedical named entities and connect Question Types: Documents and Passages Retrieval them in order to generate queries. Then, the top 200 relevant documents are yes/no, factoid, list or summary. retrieved by GoPubMed search engine. Next, we have kept only the top 100 Answers Phase 3 documents after re-ranking the top 200 documents by computing the semantic (e.g. “Yes”) similarity between question and documents title. Finally, the experiments on the Answer Processing Candidate Answers BioASQ 2014/2015 document retrieval task have demonstrated that our proposed framework is proved to be effective and competitive for biomedical documents retrieval compared to several state-of-the-art systems. In our future work, we will focus on integrating our biomedical document retrieval framework in a biomedical In this work, we address the problem of document retrieval which is an important QA system. component of biomedical QA systems. The aim of biomedical document retrieval task is to find a list of relevant documents that are likely to contain the answer. References Method [1] Athenikos SJ, Han H (2010) Biomedical question answering: A survey. Computer methods and programs in biomedicine 99(1):1–24, DOI 10.1016/j. The proposed method consists of three main steps: (1) query reformulation, (2) cmpb.2009.10.003 PubMed document retrieval using GoPubMed, and (3) biomedical document re- [2] Abacha AB, Zweigenbaum P. MEANS: A medical question-answering system ranking. combining NLP techniques and semantic Web technologies. Information 1. Query Reformulation: in this step, we process the biomedical question, written Processing & Management. 2015;51(5):570–594. in natural language, to make it efficient and optimized for searching. We have used [3] Doms, A., Schroeder, M.: Gopubmed: exploring pubmed with the gene MetaMap [2] for mapping terms in questions to Unified Medical Language System ontology. Nucl. Acids Res. 33(suppl 2), W783–W786 (2005) (UMLS) in order to extract the Biomedical Entity Names (BENs) and connect [4] McInnes, B.T., Pedersen, T., Pakhomov, S.V.: Umls-interface and umls- them with the “AND” operator. similarity: open source software for measuring paths and semantic similarity. In: 2. Pubmed Document Retrieval Using GoPubMed: the query generated in the AMIA Annual Symposium Proceedings, vol. 2009, p. 431. American Medical query reformulation phase will be fired to GoPubMed semantic search engine [3] Informatics Association (2009) in order to find the top 200 documents. [5] Tsatsaronis, G., Balikas, G., Malakasiotis, P., Partalas, I., Zschunke, M., 3. Biomedical Document Re-Ranking: the document re-ranking is the main and Alvers, M. R., Weissenborn, D., Krithara, A., Petridis, S., Polychronopoulos, D., important step in the proposed method. Indeed, we do not completely depend Almirantis, Y., Pavlopoulos, J., Baskiotis, N., Gallinari, P., Artiéres, T., on GoPubMed ranking of documents. So we re-rank the obtained 200 documents Ngomo, A.-C. N., Heino, N., Gaussier, E., Barrio-Alvers, L., Schroeder, M., again by computing the similarity between a given question and the title of each Androutsopoulos, I., & Paliouras, G. (2015). An overview of the BIOASQ large- document. We have used UMLS similarity [4] to obtain similarity between scale biomedical semantic indexing and question answering competition. BMC biomedical concepts of a question and the concepts of document title. In Bioinformatics, 16, 1–28. URL: http://dx.doi.org/10.1186/s12859-015-0564-6. fact, we have used path length as similarity measure where the similarity score is doi:10.1186/s12859-015-0564-6. inversely proportional to the number of nodes along the shortest path between the concepts.
Estimates of Measurement Uncertainty From Proficiency Testing Schemes, Internal Laboratory Quality Monitoring and During Routine Enforcement Examination of Foods