Vous êtes sur la page 1sur 9

A Novel Metric for Bone Marrow Cells Chromosome Pairing

ABSTRACT
In this project we presented a new metric algorithm which is used to compare the chromosomes and automatically pairing the chromosomes for leukemia diagnostic purposes. During the Metaphase type of cell division Karyotyping is a set of procedure produces a visual representation of the 46 chromosomes paired and arranged in decreasing order of size. This method is a difficult one because these chromosomes appear distorted, overlapped, and their images are usually blurred with undefined edges. So here Karyotyping uses new mutual information method which is proposed to increase the discriminate power of the G-banding pattern dissimilarity between chromosomes and improve the performance of the classifier. This algorithm is formulated as such a method of combinatorial optimization. Where the distances between homologous chromosomes are minimized and the distances between nonhomologous ones are maximized. A new Bone marrow chromosome dataset Lisbon-K1 (LK1) chromosome dataset with 9200 chromosomes was used in this project. chromosome contains approximately 30 000 genes (genotype) and large tracts of non coding sequences. The analysis of genetic material can involve the examination of specific chromosomal regions using DNA probes, e.g., fluorescent in situ hybridization (FISH) called molecular cytogenetic, comparative Genomic hybridization (CGH) , or the morphological and pattern analysis of entire chromosomes, the conventional cytogenetic, which is the focus of this paper. These cytogenetic studies are very important in the detection of acquired chromosomal abnormalities, such as translocations, duplications, inversions, deletions, monosomies, or trisomies. These techniques are particularly useful in the diagnosis of cancerous diseases and are the preferred ones in the characterization of the different types of leukemia, which is the motivation of this paper . The pairing of chromosomes is one of the main steps in conventional cytogenetic analysis where a correctly ordered karyogram is produced for diagnosis of genetic diseases based on the patient karyotype. The karyogram is an image representation of the stained human chromosomes with the widely used Giemsa Stain metaphase spread (G-banding) , where the chromosomes are arranged in 22 pairs of somatic homologous elements plus two sexdeterminative chromosomes (XX for the female or XY for the male), displayed in decreasing order of size. A karyotype is the set of characteristics extracted from the karyogram that may be used to detect

INTRODUCTION
The study of chromosome morphology and its relation with some genetic diseases is the main goal of cytogenetic. Normal human cells have 23 classes of large linear nuclear chromosomes, in a total of 46 chromosomes per cell. The

chromosomal abnormalities. The metaphase is the step of the cellular division process where the chromosomes are in their most condensed state. This is the most appropriated moment to its visualization and abnormality recognition because the chromosomes appear well defined and clear. The pairing and karyotyping procedure, usually done manually by visual inspection, is time consuming and technically demanding. The application of the G-banding procedure to the chromosomes generates a distinct transverse banding pattern characteristic for each class, which is the most important feature for chromosome classification and pairing. The International System for Cytogenetic Nomenclature (ISCN) provides standard diagrams/ideograms of band profiles, as for all the chromosomes of a normal human, and the clinical staff is trained to pair and interpret each specific karyogram according to the ISCN information. Other features, related to the chromosome dimensions and shape, are also used to increase the discriminative power of the manual or automatic classifiers. Fig 2 Normal male karyotype

Fig 3 Difference between the chromosomes quality in Edinburgh, Copenhagen, and Philadelphia

THE HIMAN KARYOTYPE

Fig 1 Metaphase plate of a normal male Fig 4 human Karyotype

Most (but not all) species have a standard karyotype. The normal human karyotypes contain 22 pairs of autosomal chromosomes and one pair of sex chromosomes. Normal karyotypes for females contain two X chromosomes and are denoted 46, XX; males have both an X and a Y chromosome denoted 46, XY. Any variation from the standard karyotype may lead to developmental abnormalities. SLIDE PREPARATION
Cells from bone marrow, blood, amniotic fluid, cord blood, tumor, and tissues (including skin, umbilical cord, chorionic villi, liver, and many other organs) can be cultured using standard cell culture techniques in order to increase their number. A mitotic inhibitor (colchicine, colcemid) is then added to the culture. This stops cell division at mitosis which allows an increased yield of mitotic cells for analysis. The cells are then centrifuged and media and mitotic inhibitor are removed, and replaced with a hypotonic solution. This causes the white blood cells or fibroblasts to swell so that the chromosomes will spread when added to a slide as well as lyses the red blood cells. After the cells have been allowed to sit in hypotonic, Carnoy's fixative (3:1 methanol to glacial acetic acid) is added. This kills the cells and hardens the nuclei of the remaining white blood cells. The cells are generally fixed repeatedly to remove any debris or remaining red blood cells. The cell suspension is then dropped onto specimen slides. After aging the slides in an oven or waiting a few days they are ready for banding and analysis.

ANALYSIS
Analysis of banded chromosomes is done at a microscope by a clinical laboratory specialist in cytogenetics (CLSp(CG)). Generally 20 cells are analyzed which is enough to rule out mosaicism to an acceptable level. The results are summarized and given to a board-certified cytogeneticist for review, and to write an interpretation taking into account the patients previous history and other clinical findings. The results are then given out reported in an International System for Human Cytogenetic Nomenclature 2009 (ISCN2009)

ALGORITHM DESCRIPTION
The algorithm described in this paper is composed of the following three sequential steps. 1) Image processingIn this step, the chromosome images, extracted from the unordered karyogram, are processed by making histogram equalization, geometric distortion compensation, and dimensional scaling normalization (see Section II-A). 2) Feature extractionIn this step, discriminative features are extracted from the processed images, e.g., dimensions, normalized area, G-banding profile, and mutual information (MI) [26] between each pair of chromosomes in the karyogram (see Section II-B). The features extracted in this step are organized in a distance matrix containing the distances (using a given metric described later) between every two chromosomes in the karyogram. 3) PairingIn this step, a combinatorial optimization problem is solved in order to obtain a permutation matrix that establishes the right correspondence between the chromosomes of each pair.

out low level residual disease, generally between 200 and 1000 cells are counted and scored. For congenital problems usually 20 metaphase cells are scored. Future of cytogenetics Advances now focus on molecular cytogenetics including automated systems for counting the results of standard FISH preparations and techniques for virtual karyotyping, such as comparative genomic hybridization arrays, CGH and Single nucleotide polymorphism-arrays.

The images were acquired with a Leica Optical Microscope DM 2500. Some image preprocessing tasks, namely, noise reduction and chromosome segmentation, were manually performed with Leica continuous wave (CW) 4000 Karyo software used by the clinical staff. The pairing ground truth was obtained manually by the technical staff of the Institute of Molecular Medicine, Lisbon, and used to assess the accuracy of the proposed pairing algorithms.

MATERIALS AND METHODS


Slide preparation The slide is aged using a salt solution usually consisting of 2X SSC (salt, sodium citrate). The slides are then dehydrated in ethanol, and the probe mixture is added. The sample DNA and the probe DNA are then co-denatured using a heated plate and allowed to re-anneal for at least 4 hours. The slides are then washed to remove excess unbound probe, and counterstained with 4',6-Diamidino-2-phenylindole (DAPI) or propidium iodide. Analysis Analysis of FISH specimens is done by fluorescence microscopy by a clinical laboratory specialist in cytogenetics. For oncology generally a large number of interphase cells are scored in order to rule

Fig 4 a) Two different metaphase plates


containing bone marrow chromosomes b) chromosomes from the Copenhagen dataset A new chromosome dataset LK1 [29] was created in collaboration with the Institute of Molecular Medicine, Lisbon, to test the classification and pairing algorithms of this type of low quality chromosomes for leukemia diagnosis purposes. The bone marrow cell chromosomes in this new dataset were manually segmented, correctly oriented, ordered and annotated by he clinical staff to be used as ground truth data in the conducted tests. To further validate the proposed algorithm, experiments were made by using Grisan et al. dataset [30]. This dataset is of the same nature and

quality as the Philadelphia, Edinburgh, and Copenhagen datasets because the images are based on cells extracted from the amniotic fluid and choroidal villi (prenatal cytogenetics).

Fig 5 very low quality kariogram

Fig 6 Geometrical compensation. (a) Original image. (b) Chromosome and medial axis segmentation. (c) Axis smoothing. (d) and (e) Interpolation along orthogonal lines to the smoothed medial axis. (f) Border regularization The automatic pairing algorithm is composed of four main steps: 1) chromosome image extraction from the unordered karyogram and image processing; 2) feature extraction; 3) classifier training; and 4) pairing. In the next sections, these components are described in detail.

brightness and contrast depend on the specific tuning of the microscope and the particular geometric shape of each chromosome depends on the specific metaphase plaque from which the chromosomes were extracted. These effects must be compensated to improve the results of the pairing algorithm. The image processing step is composed of the following operations. 1) Chromosome extractionEach chromosome is isolated from the unordered karyogram. 2) Geometrical compensationThe geometric compensation, performed by using the algorithm is needed to obtain chromosomes with vertical medial axis, This compensation algorithm is composed of the following main steps: a) chromosome and medial axis segmentation b) axis smoothing c) interpolationalong orthogonal lines to the smoothed medial axis d) border regularization 3) Shape normalizationThe features used in the comparison of chromosomes are grouped into two classes: 1) geometric based 2) pattern based (G-banding).

IMAGE PROCESSING
The image processing step aims at image contrast enhancement and compensation of geometric distortions observed in each chromosome not related with its intrinsic shape or size. The image

Fig 7 Dimension and shape normalization and intensity equalization. (a)

Geometrically compensated image. (b) Spatial normalization. (c) Histogram equalization. (d) Band profile To compare chromosomes from a band pattern point of view, geometrical and dimensional differences must be removed, or at least attenuated. Therefore, a dimensional scaling is performed before the pattern features is extracted to make all the chromosome with the same size and aspect ratio by interpolating the original images. 4) Intensity compensationThe metaphase plaque from which the chromosomes are extracted does not present a uniform brightness and contrast. To compensate for this inhomogeneity, the spatially scaled images are histogram equalized.

convert a grayscale image to true color format by concatenating three copies of the original matrix along the third dimension. RGB = cat (3,I,I,I); The resulting true color image has identical matrices for the red, green, and blue planes, so the image displays as shades of gray.In addition to these image type conversion functions, there are other functions that return a different image type as part of the operation they perform. For example, the region of interest functions returns a binary image that you can use to mask an image for filtering or for other operations. Denoising We may define noise to be any degradation in the image signal, caused by external disturbance. If an image is being sent electronically from one place to another, via satellite or wireless transmission, or through networked cable, we may expect errors to occur in the image signal. These errors will appear on the image output in different ways depending on the type of disturbance in the signal. Usually we know what type of errors to expect, and hence the type of noise on the image; hence we can choose the most appropriate method for reducing the effects. Cleaning an image corrupted by noise is thus an important area of image restoration. Edge detection Edges contain some of the most useful information in an image. We may use edges to measure the size of objects in an image; to isolate particular objects from their background; to recognize or classify objects. There is a large number of edge finding algorithms in existence, and we shall

CONCEPTS PHASE

USED

IN

THID

1) Image conversion 2) Denoising 3) Edge detection 4) Two dimensional convolutions. Image conversion The toolbox includes many functions that you can use to convert an image from one type to another, listed in the following table. For example, if you want to filter a color image that is stored as an indexed image, you must first convert it to true color format. When you apply the filter to the true color image, MATLAB filters the intensity values in the image, as is appropriate. If you attempt to filter the indexed image, MATLAB simply applies the filter to the indices in the indexed image matrix, and the results might not be meaningful. You can perform certain conversions just using MATLAB syntax. For example, you can

look at some of the more straightforward of them. The general Matlab command for finding edges is edge(image,'method',parameters. . . ) Where the parameters available depend on the method used Two dimensional convolutions C = conv2(A,B) computes the twodimensional convolution of matrices A and B. If one of these matrices describes a twodimensional finite impulse response (FIR) filter, the other matrix is filtered in two dimensions. The size of C in each dimension is equal to the sum of the corresponding dimensions of the input matrices, minus one. That is, if the size of A is [ma,na] and the size of B is [mb,nb], then the size of C is [ma+mb-1,na+nb-1]. The indices of the center element of B are defined as floor(([mb nb]+1)/2). C = conv2(hcol,hrow,A) convolves A first with the vector hcol along the rows and then with the vector hrow along the columns. If hcol is a column vector and hrow is a row vector, this case is the same as C = conv2(hcol*hrow,A). C = conv2(...,'shape') returns a subsection of the two-dimensional convolution, as specified by the shape parameter

insensitivity to feature dimensionality. A complete description of the theory of SVMs for pattern recognition is given in . In contrast to conventional off-line learning algorithms for classification, the on-line adaptivity is incorporated into the algorithmic design to accommodate the ever-changing experimental conditions. An Online Support Vector Classifier (OSVC), which keeps removing support vectors from the old model and assigning new training examples weighted according to their importance, is thus proposed. Moreover, it can be formulated as an integer programming problem, thus allowing for very efficient optimization methods. To do so, the cost function, as well as the constraints, have to be expressed by linear functions of the variables. Considering n chromosomes (for n even), a pairing assignment P is defined as a set of ordered pairs (i, j), such that: 1) holds for any pair and 2) any given index i appears in no more than one pair of the set. A pairing assignment is said to be total if and only if, for any i = 1, . . . , n, there is exactly one pair (r, s) in the set such that either i=r or i = s. The sum of distances implied by a pairing P can be written as

CLASSIFIER
The pairing process is a computationally hard problem because the optimal pairing must minimize the overall distance, i.e., the solution is the global minimum of the cost function. This problem can be stated as a combinatorial optimization problem. A classifier with good overall performance, mapping inputs into a higher dimensional space wherein an optimized linear division with least errors and maximal margin is seeked. Its training process guarantees a globally optimized solution, avoidance of over-fitting, and and the goal of the pairing process is to find a total pairing P that minimizes C(P). Note that the cost function (11) can be reformulated as a matrix inner product between the distance matrix D and a pairing matrix X = {x(i, j)}, where

It can be rewritten as C(P) = (1/2)D X where denotes the usual matrix inner product, which is defined as follows

The cost function then becomes linear with the pairing matrix X. The entries of this matrix are the parameters with respect to which (13) is to be minimized. In order for the matrix X to represent a valid total pairing, this matrix has to satisfy constraints 1) and 2) mentiones earlier, which can be expressed in linear form as follows: constraint 1) is equivalent to state that the main diagonal of D is all zeros and constraint 2) corresponds to having one and only one entry equal to 1 in each row, as well as in each column. Constraining the domain of the matrix entries to be Boolean (i.e., x(i, j) say that {0, 1}), the latter is the same to

The combinatorial optimization problem can then be restated as a integer programming problem, consisting of minimize D X

data is supplied to all these algorithms in batches and thus a large amount of computation is involved. Recently, various online SVM algorithms have been proposed to extend the SVM to the online setting. The standard online SVM algorithms are discussed in the binary classification setting without addressing the issues that ensue when different classes are of different levels of importance. Due to the ever-changing experimental conditions, the model has to be updated periodically. Biologists hope that after labelling only one or two movies of microscopy images, the updated classifiers will automatically classify new examples with better accuracy.However, before applying online SVM to the task of cell phase identification; three difficulties have to be circumvented. First, the data sets are critically imbalanced. The classification accuracy will be undesirably biased toward the classes with more samples.Second, sometimes, the classes with fewer samples may be more important than other classes. For example, prophase plays an important role for the identification of the starting point of the mitosis process, but there are only about 140 examples of prophase in a movie of 200 frames of microscope images. Last, the classification problem should be addressed in the multiclass setting.

CONCLUTION
An array of algorithms has been developed to solve the SVM QP problem . They work by continually searching from the current support vector and extending along the specified feasible direction u. The Sequential Minimal Optimization (SMO) algorithm chooses the direction with only two non-zero elements, which are determined by the so-called pair. However

REFERENCES
[1] C.M. Price, Fluorescence in situ hybridization, Blood Rev., vol. 7, no. 2, pp. 127134, 1993.

[2] J. C. Tan, J. J. Patel, A. Tan, C. J. Blain, T. J. Albert, N. F. Lobo, and M. T. Ferdig, Optimizing comparative genomic hybridization probes for genotyping and SNP detection in plasmodium falciparum, Genomics, vol. 93, no. 6, pp. 543550, Jun. 2009. [3] D. E. Rooney and B. H. Czepulkowski, Human Cytogenetics, A Practical Approach, vol. II, 2nd ed. Ithaca, NY: IRL Press, 1992. [4] J. Swansbury, Cancer Cytogenetics: Methods and Protocols (Methods in Molecular Biology). Totowa, NJ: Humana Press, 2003. [5] H. Lodish, A. Berk, S. L. Zipursky, P. Matsudaira, D. Baltimore, and J. E. Darnell, Molecular Cell Biology, 4th ed. San Francisco, CA: Freeman, 2004. [6] B. Czepulkowski, Analyzing Chromosomes. Abingdon, U.K.: BIOS Scientific Publishing, 2001. [7] L. G. Shaffer and N. Tommerup, Eds., An International System for Human Cytogenetic Nomenclature (ISCN), 2005. Basel, Switzerland: Karger and Cytogenetic and Genome Research, 2004, 130 pp. ISBN 3-8055- 8019-3. [8] J. Piper and E. Granum, On fully automatic feature measurement for banded chromosome classification, Cytometry, vol. 10, pp. 242255, 1989. [9] J. R. Stanley, M. J. Keller, P. Gader, and W. C. Caldwell, Data-driven homologue matching for chromosome identification, IEEE Trans. Med. Imag., vol. 17, no. 3, pp. 451462, Jun. 1998. [10] M. Zardoshti-Kermani and A. Afshordi, Classification of chromosomes using higher-order neural networks, in Proc. IEEE Int. Conf. Neural Netw., Nov./Dec., 1995, vol. 5, pp. 25872591.

Vous aimerez peut-être aussi