Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Machine Learning in Bio-Signal Analysis and Diagnostic Imaging
Machine Learning in Bio-Signal Analysis and Diagnostic Imaging
Machine Learning in Bio-Signal Analysis and Diagnostic Imaging
Ebook657 pages15 hours

Machine Learning in Bio-Signal Analysis and Diagnostic Imaging

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Machine Learning in Bio-Signal Analysis and Diagnostic Imaging presents original research on the advanced analysis and classification techniques of biomedical signals and images that cover both supervised and unsupervised machine learning models, standards, algorithms, and their applications, along with the difficulties and challenges faced by healthcare professionals in analyzing biomedical signals and diagnostic images. These intelligent recommender systems are designed based on machine learning, soft computing, computer vision, artificial intelligence and data mining techniques. Classification and clustering techniques, such as PCA, SVM, techniques, Naive Bayes, Neural Network, Decision trees, and Association Rule Mining are among the approaches presented.

The design of high accuracy decision support systems assists and eases the job of healthcare practitioners and suits a variety of applications. Integrating Machine Learning (ML) technology with human visual psychometrics helps to meet the demands of radiologists in improving the efficiency and quality of diagnosis in dealing with unique and complex diseases in real time by reducing human errors and allowing fast and rigorous analysis. The book's target audience includes professors and students in biomedical engineering and medical schools, researchers and engineers.

  • Examines a variety of machine learning techniques applied to bio-signal analysis and diagnostic imaging
  • Discusses various methods of using intelligent systems based on machine learning, soft computing, computer vision, artificial intelligence and data mining
  • Covers the most recent research on machine learning in imaging analysis and includes applications to a number of domains
LanguageEnglish
Release dateNov 30, 2018
ISBN9780128160879
Machine Learning in Bio-Signal Analysis and Diagnostic Imaging

Read more from Nilanjan Dey

Related to Machine Learning in Bio-Signal Analysis and Diagnostic Imaging

Related ebooks

Technology & Engineering For You

View More

Related articles

Related categories

Reviews for Machine Learning in Bio-Signal Analysis and Diagnostic Imaging

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Machine Learning in Bio-Signal Analysis and Diagnostic Imaging - Nilanjan Dey

    Brazil

    Preface

    Nilanjan Dey⁎, ⁎ Techno India College of Technology, Kolkata, India

    Surekha Borra†, † KS Institute of Technology, Bangalore, India

    Amira S. Ashour‡, ‡ Tanta University, Tanta, Egypt

    Fuqian Shi§, § Wenzhou Medical University, Wenzhou, People’s Republic of China

    Innovations such as telemedicine and recommender systems employ intelligent systems which can automatically predict the diseases and suggest recommendations to the patients based on electronic patient records (EPR) such as ECG signals, MRI, CT-scan, X-ray, ultrasound and PET images. Recent advances in data mining have led to the development of automatic analysis tools which helps in early detection and accurate prediction of patient diseases such as breast cancer, lung cancer, diabetics, heart diseases, acute kidney injury, and so on. These intelligent recommender systems are designed based on machine learning, soft computing, computer vision, artificial intelligence, and data mining techniques. Classification and clustering techniques such as PCA, SVM, Naive Bayes, Neural Network, Decision trees, Association Rule Mining, Random forests, Convolutional Neural Networks, and so on, are few among different kinds of approaches.

    The design of high accuracy decision support systems assists and eases the healthcare practitioners and suits variety of applications in the Health sector. Integrating the machine learning (ML) technology with the human visual psychometrics helps meeting the demands of the radiologists in improving the efficiency and quality of diagnosis in dealing with unique and complex diseases at real time, by reducing the human errors, and allowing fast and rigorous analysis.

    This book on Machine Learning in Bio-Signal Analysis and Diagnostic Imaging presents original and valuable research work on advanced analysis and classification techniques of biomedical signals and images, which covers the introduction, design, and optimization of techniques in both supervised and unsupervised machine learning models, standards, algorithms, applications, difficulties and challenges faced by healthcare professionals in analyzing the biomedical signals and diagnostic images.

    Chapter 1 presents an ontology-based medical report mapping process (OMRMP) to represent the contents of 3654 unstructured upper gastrointestinal endoscopy reports written in Brazilian Portuguese into a database format. This is achieved by feeding OMRMP with reports containing information from bio-signal, images and videos collected during medical procedures to machine learning models. A satisfactory mapping performance is reported after comparing the results with previous ones in smaller and simpler sets of reports and also on sets with different sizes.

    Chapter 2 presents a computer-aided diagnoses system for detecting multiple ocular diseases using color retinal fundus images using the concept of problem transformation method, multiple class classifiers, and multilevel support vector machine (MLSVM) acquisition of images, preprocessing, enhancement, segmentation of the regions that will be extracted, feature extraction and selection, classification, and evaluation. Results reported 96.1% accuracy with mild training time when applied on DIARET dataset.

    Chapter 3 presents a DEFS based system for the differential diagnosis of severe fatty liver and cirrhotic liver to investigate the potential of conventional gray scale B-mode ultrasound imaging modality. Results indicated that the optimal subsets of FOS + Laws’ features set yielded by kNN-DEFS wrapper based feature selection algorithm outperformed with an AA (SD) value of 99.5 (0.8). From the exhaustive experiments carried out in the present work, it can be concluded that selection of a suitable classifier in a wrapper based feature selection algorithms can enhance the performance of the CAD system design.

    Chapter 4 presents a comprehensive review of the soft computing techniques and the theory behind infrared thermography applied to medical image analysis, the focus being assessment of diabetic foot complications. A hybrid soft computing paradigm using fuzzy logic and artificial neural networks with a deep learning architecture is also discussed. The issues and challenges to be addressed in using infrared thermography for diagnostic purposes and the hardware/software/technology considerations in perspective are also presented.

    In Chapter 5 the heart rate variability (HRV) of normal (NOR) subjects, HTN and CAD patients has been compared using linear and nonlinear features with different classifiers considering 5 min recordings of electrocardiogram (ECG) for processing of consecutive heartbeat (RR) interval tachogram. The results obtained using the proposed methodology indicate that this computer aided classification system can be used as an additional diagnostic tool to effectively differentiate between the normal subjects and HTN and CAD affected patients.

    Chapter 6 presents exhaustive experimentations performed for the selection of optimum ROI size for development of a computer assisted framework for breast tissue pattern characterization using digitized screen film mammograms. The results of the study demonstrate that the ROI size of 128 × 128 pixels manually extracted from the core region of the breast provide significant information to discrimination between various breast tissue pattern classes. It may be also noticed that higher accuracy of 91.2% is achieved by 2-class breast density classification module in comparison to 79.5% as achieved by 4-class breast density classification framework.

    Chapter 7 reviews the ANN’s architecture from perceptron model to updated FNN model along with its components: learning, error calculation and weight updation, focusing more on optimizing FNN architecture with nature-inspired algorithms (NIAs). The study found that the architectures and weights are optimized in parallel, and FNNs considered for optimization was mostly of single hidden layer. The study also suggested that NIAs (e.g., GA, PSO, ABC, and all) cannot update each and every component of FNN by a single method and hence requires developing some hybrid strategies for constructing optimized FNN.

    Chapter 8 presents a comparative study of various ensemble learning techniques such as bagging and boosting (adaptive & logistic boosting) and infers the suitable composition for motor-imagery EEG signal classification. The chapter concludes that wavelet based Engent is reliable feature extraction technique under the Exp-IV configurations. The Type-III architecture of experiment IV made of KNN base classifier with K = 7 is the best performing ensemble composition. The multiple types of base classifiers bring higher diversity in terms of decision boundary than diversity drawn from variation in the hyper-parameters for a single type of base classifier.

    Chapter 9 introduces significant topics of the multilabel classification in medical image analysis. A detailed analysis and discussions of the literature findings are presented. The performance of the methods is compared on five publicly available data sets such as Yeast, Scene, genebase, corel5k, and BibTex of multilabel classification using famous measures. Further, a computer-aided CAD system framework for the existing multilabel classification research is presented.

    Chapter 10 presents an overview of techniques, tools, and challenges involved in multimodal figure search in biomedical literature to support doctors and scientists. A survey of techniques and methods involved in building a search engine, figure extraction, figure classification, figure indexing, search engines, web applications are presented.

    Chapter 11 reviews the methods and challenges of applying various machine learning (ML) algorithms for image classification and security.

    Chapter 12 reviewed the current state and overview of internet of medical robotic things IoMT related services and technologies in healthcare and monitoring. Also presented is the overview of robotics in the health care, health services, medical emergencies, robotic surgery, long term benefits of human being using robotics with IoMT and the challenges. The need for designing and developing the hardware and software modules for better transmission of large and critical data with the advanced communication network is highlighted.

    This book is useful to researchers, practitioners, manufacturers, professionals, and engineers in the field of bio medical systems engineering and may be referred by students for advanced material.

    We would like to express gratitude to the authors for their contributions. Our gratitude is extended to the reviewers for their diligence in reviewing the chapters. Special thanks to our publisher, Elsevier Ltd.

    As editors, we wish this book will stimulate further research in developing algorithms and optimization approaches related to Machine Learning in Bio-Signal Analysis and Diagnostic Imaging.

    Editors

    Chapter 1

    Ontology-Based Process for Unstructured Medical Report Mapping

    Jefferson Tales Oliva⁎,†; Huei Diana Lee†; Newton Spolaôr†; Claudio Saddy Rodrigues Coy‡; João José Fagundes‡; Maria de Lourdes Setsuko Ayrizono‡; Feng Chung Wu†,‡    ⁎ Bioinspired Computing Laboratory, Graduate Program in Computer Science and Computational Mathematics, University of São Paulo, São Carlos, Brazil

    † Laboratory of Bioinformatics, Graduate Program in Electrical Engineering and Computer Science, Western Paraná State University, Foz do Iguaçu, Brazil

    ‡ Service of Coloproctology, Graduate Program in Medical Sciences, University of Campinas, Campinas, Brazil

    Abstract

    Hospitals and clinics store an increasing amount of clinical data, such as medical reports. These reports often describe, in natural language, findings from bio-signals, images, and videos collected during a medical procedure. Data mining can explore reports data to find patterns useful to assist experts’ decision making processes and medical procedure development. However, the content of medical reports is rarely organized into an appropriate format. To tackle this issue, we developed the ontology-based Medical Report Mapping Process to represent the content of unstructured reports into a database format. This chapter applied the ontology-based process to map 3654 unstructured upper gastrointestinal endoscopy reports written in Brazilian Portuguese. As a result, a satisfactory mapping performance was achieved. By comparing this result with previous ones in smaller and simpler sets of reports, this chapter suggests that the ontology-based process performs well in sets with different sizes.

    Keywords

    Text mining; Natural language processing; Data mining; Ontologies; Medical reports

    1 Introduction

    The human digestive system consists of several anatomical portions in which different abnormalities can occur [1, 2]. In particular, upper alimentary tract like esophagus, stomach, and duodenum, is susceptible to some common diseases and conditions, such as cancer. In Brazil, for example, 7600 and 12,920 new cases of stomach cancer were estimated for the year 2016 in women and men, respectively, and an early and accurate diagnosis becomes imperative in the control and treatment of these diseases [3].

    Therefore, upper gastrointestinal endoscopy (UGIE) is an important tool for diagnosis and treatment of lesions in these anatomical regions. It allows experts to capture video and images from the alimentary tract during the UGIE examination. After this medical procedure, the experts usually create textual reports to record findings and information regarding the examination, supplementing the acquired media [4]. In the end, all the medical data regarding UGIE is stored in computers and other equipments and this data is used, for example, to diagnose gastrointestinal disorders.

    Data storage capacity is increasing, however, the higher the amount of stored data, the harder the analysis of the data is. This is also the case for medical data in many hospitals and clinics. Human analysis of this data requires valuable time from experts and is susceptible to subjectivity. In this scenario, computational methods have been applied to support analysis of medical video, image, bio-signal, and text [5–9].

    By focusing on text, one can note that this media type is essential, for example, in medical reports and records regarding UGIE or other medical procedures. To analyze this data with the assistance of computers, text mining methods are a relevant alternative. The idea is to identify, retrieve and analyze patterns in unstructured text written in natural language [10]. As a result, medical experts can obtain relevant content from a large amount of reports and textual records.

    Although text mining methods are powerful tools to find data patterns, they are susceptible to different textual issues, such as mistypings, synonyms variability, irrelevant words and missing information [11]. In this scenario, we developed, in collaboration with medical and computer experts, the ontology-based medical report mapping process (OMRMP) method and its implementation as a computational tool [12, 13]. This proposal stands out due to its ability to transform content from unstructured medical reports into a format similar to a database (DB) table. To do so, a domain specific ontology is considered as an alternative to represent mapping rules that transform sets of relevant terms into values for database attributes. In particular, the ontology—a structure with classes, instances and relations [14]—integrates experts’ knowledge and links reports words with database table values. OMRMP yields an attribute value table that is useful as an input for experts’ studies, analyses, and decision making processes. The table is also compatible with data mining, machine learning, and other computational intelligence approaches that extract knowledge from structured data [15]. Another difference from conventional text mining methods is that OMRMP ontology and auxiliary structures deal with textual issues usual in reports from medical domain.

    This chapter aims to apply the OMRMP method and tool to map a large set of 3654 artificial reports with valid terms usual in real UGIE medical reports. As a result, unstructured information from these reports, written in natural language, was successfully structured. Moreover, satisfactory results were achieved in terms of reduction of phrases and words, and percentage of report terms mapped. These achievements are competitive with the ones from previous work with smaller sets of UGIE reports, with less textual patters to map.

    This chapter is organized as follows: Section 2 presents some pieces of related chapter. Section 3 describes the OMRMP method and its computational implementation. Section 4 details the experimental setup conducted in this chapter and Section 5 reports and discusses the results obtained by applying OMRMP in 3654 textual reports. Section 6 concludes this chapter with final highlights.

    2 Related Work

    Some concepts used in related work are named-entity recognition (NER) and the unified Medical language system (UMLS) [16], which can be computed in applications related to natural language processing (NLP) and text mining. The former concept is an alternative to extract meaningful terms, such as the names of diseases and gens, from unstructured biomedical text [17, 18]. UMLS, in turn, consists in a collection of ontologies and vocabularies from distinct domains that can be used to support text processing methods [19].

    Lee et al. [20] develops a method that illustrates NER application to process a corpus composed of MEDLINE abstracts [21]. After finding terms that delimit named entities, the proposal categorizes them into semantic classes described in an ontology. An important component of this method, a machine learning algorithm named Support Vector Machines [15], is used to identify entities boundaries and to classify the entities found.

    Another piece of related work performs two additional steps before NER application [22]. In particular, the first step uses natural language processing techniques to split pieces of text into sentences before tagging its words with appropriate parts of speech. Nouns are then submitted to the second step to create groups of words representing noun phrases. Afterwards, NER matches the obtained phrases with concepts of an ontology to yield named entities.

    NER and UMLS were combined in Khordad et al. [23]. In particular, a system was proposed to identify a type of named entity—phenotype names—and related information in biomedical text. To this end, a computational tool titled MetaMap [24] is used to associate input text with components from a phenotype ontology and the UMLS metathesaurus—a large dictionary linking synonyms from different vocabularies.

    The SemPathFinder system was proposed in Song et al. [25] to discover relations considered unknown in biomedical texts. For this, text mining methods and UMLS are used to extract, in each text sentence, entities, and their relations.

    Scuba et al. [26] presented the web-based tool called Knowledge Author, which aims to facilitate the representation of domain content in an ontology for information extraction in clinical texts. This tool supports the search for terms in the UMLS Metathesaurus database. The Knowledge Author was applied in 34 clinical free-text radiology reports to extract concepts related to carotid stenosis.

    A more recent piece of work applies recurrent neural networks, a deep learning representative, to recognize named entities from text [27]. By doing so, the dependence from an appropriate feature set to feed machine learning algorithms is reduced. Experimental evaluations in two corpora showed competitive results.

    Becker et al. [28] develops a web system that collaborates to transform concepts from free-text case definitions into codes inherent to UMLS. To ameliorate the complexity of dealing with free-text processing, the user can revise the extracted concepts, adding, removing, or expanding them to related concepts.

    A recommendation system for biomedical ontologies (Ontology Recommender 2.0) was developed in the National Center for Biomedical Ontology [29]. This system process a text corpus or a keyword list to suggest appropriate ontologies for terms presented in the processed text.

    In other work [30], a system, called Casama (Contextualized Semantic Maps) [31], was used for representation and summarization of biomedical literature, was applied in a collection of articles about lung cancer to summarize experimental conditions (study design and outcome measures) and patient/population level (properties related to study of the population considered in the study). To do so, the Casama uses semantic maps, which are composed of the set of relations that describe main findings related to clinical studies.

    Besides NER and UMLS, other ideas can be used by computational approaches to extract information from unstructured biomedical text. An example from the literature is based on a domain-specific knowledge dictionary [32]. Such a dictionary contains rules that make possible the mapping of medical reports content into structured databases. Despite of its ability for biomedical text processing, this approach uses a dictionary with little flexibility to represent domain knowledge. To allow users to represent more sophisticated and complete rules, as well as to explore object-oriented programming concepts to link reports terms and database attribute values, the dictionary was replaced with an ontology in OMRMP [12, 13]. The proposed method also included support for more text mining procedures, such as lemmatization.

    In what follows, OMRMP and the associated computational tool are described, due to their relevance in the experimental evaluation conducted in this chapter.

    3 Ontology-Based Medical Report Mapping Process

    The OMRMP is applied into two phases. In the first phase, text-processing techniques are applied in textual reports to find relevant patterns, which are used to build structures required to standardize reports and map their content into a DB, whose tasks are performed in second OMRMP phase [33].

    3.1 First OMRMP Phase

    The following methods are applied in the first OMRMP phase (Fig. 1):

    Unique phrase identification: the content of all reports are concatenated into a textual file. Subsequently, in this file, the phrases are alphabetically sorted and the repeated sentences are removed. Thus, one entry for each phrase is kept, resulting a structure called unique phrase set (UPS) [12]. This approach allows to view all different phrases of report sets into a simple structure, making less difficult the identification of relevant patterns. In the UPS, other text processing methods are applied to reduce the phrase variability and make it more friendly to build structures required for application of the next OMRMP phase. It is important to emphasize that, after the application of each text processing technique in a UPS, the unique phrase identification is reapplied in this structure.

    UPS normalization: the UPS1 (built by the previous method) content is normalized to replace uppercase and/or accented characters by equivalent lowercase and nonaccented characters, generating the UPS2 [34]. This method is useful mainly for text written in languages (e.g., Portuguese) that use accentuation in words. Also, reports can contain sentences that are considered different by computational processes only because some character is uppercase. For example, the character a has variations such as A, á, Á, à, À, ã, Ã, â, and Â.

    Stopword removal: the previous UPS is used to identify terms that are considered irrelevant (stopwords) [35]. Disposable words can be prepositions, adverbs, special characters, and other particular words. Each irrelevant term composes a list named stoplist, which is used by the stopword removal technique to generate the UPS3 and preprocess textual reports. Likewise, the stoplist is represented in the XML (extensible markup language) language. Fig. 2 presents a stoplist example represented in XML, where the attribute number is the amount of terms considered irrelevant added to the stopword list.

    Fig. 2 Stoplist structure example.

    Lemmatization application: each word is morphologically reduced to its canonical format (lemma) [36]. For example, verbs, plural nouns, and female terms are transformed into their infinitive, singular, and male counterpart. This method generates the UPS4.

    Standardization application: a standardization file (SF) is built, together with domain experts, to standardize textual reports in OMRMP Phase 2. In this file, standardization rules (SR) are defined to replace synonyms by a simple word or phrase, keeping the same meaning [37]. After the SF building, this structure is applied to generate the CFU5. Also, the SF can expanded with more SR in other experiments, when the complexity increases when more or other reports, from the same domain, are used in the OMRMP process [34]. Like the stoplist, SF is represented in XML. Fig. 3 presents a SF structure example, where number attribute is the amount of SR, synonym tag represents a SR, old tag describes a sentence/term to be replaced, new tag corresponds to a new sentence/term which will replace the one represented by new, and "n attribute is the number of new sentences/terms that will replace at once the content presented in the old" tag.

    Fig. 3 SF structure example.

    Fig. 1 Schematic representation of the first OMRMP phase. Modified from J.T. Oliva, Automation of the Medical Report Mapping Process for a Structured Representation (Masters dissertation), State University of West Paraná, Foz do Iguaçu, Brazil (in Portuguese).

    After the application of previous text processing techniques to generate UPSs, the last structure (e.g., UPS 5) is analyzed, together with domain experts, to build an ontology, which is used to generate a BD and map relevant patterns into report content to this database. To do so, attributes and Mapping Rules (MR) are defined to represent relevant information that can be found in medical reports [33]. In the ontology, each MR is associated to an attribute, whose combinations determine how the database is filled [38].

    The MR represents a phrase, whose content may be in one of the following formats: location characteristic or location characteristic subcharacteristic. Location is a term which describes a body human part. Characteristic is an information regarding to the location condition, such as an abnormality or other relevant information. Subcharacteristics complement particular characteristics (e.g., measure of an injury). For example, the phrase middle esophagus with 5mm ulcer contains the following MR components: location (middle esophagus), characteristic (ulcer), and subcharacteristic (5 mm) [39].

    The OMRMP ontology is structured in the ontology web language (OWL), which is used for knowledge representation by definition of taxonomies and classification networks [40]. This language was used due to its resources that increases flexibility for ontology development, expanding the potential for representation of relevant information in these structures [39]. The classes considered in our ontology are the following:

    Thing: the main class in OWL language. All components of an OWL ontology is connected to this class.

    Attribute: this class contains information about attributes that compose the DB. Attribute class is composed by two subclasses, which contain relevant information for each attribute and MRs:

    Attribute name: label assigned to the attribute.

    Attribute type: possible values for each characteristic or subcharacteristic of a MR.

    Term: possible vocables that can be found into phrases of medical reports. This class represents MRs. The following subclasses describe terms that composes MR:

    Region: terms that describe body human part (local).

    Observations: terms which describes the local condition. The following subclasses describe other two MR components:

    Characteristic.

    Subcharacteristic.

    Fig. 4 shows an ontology structure example considered in the OMRMP. In this figure, rectangles represent classes and lines outline hierarchical relationship among these classes.

    Fig. 4 Ontology structure example.

    It should be emphasized that once the first OMRMP phase is applied to textual reports of a particular domain, the stoplist, SF, and ontology can be reused by this process in other experimental evaluations in the same

    Enjoying the preview?
    Page 1 of 1