Vous êtes sur la page 1sur 10

HISTORY AND CURRENT STATUS OF

CHEMOINFORMATICS

HISTORY OF CHEMOINFORMATICS
Cheminformatics (also known as chemoinformatics, chemioinformatics and
chemical informatics) is the use of computer and informational techniques
applied to a range of problems in the field of chemistry. Chem(o)informatics is a
generic term that encompasses the design, creation, organisation, storage,
management, retrieval, analysis, dissemination, visualisation and use of
chemical information, not only in its own right, but as a surrogate or index for
other data, information and knowledge.
The word chemoinformatics is rather new, but papers that fall under this field
date back to the mid-1960s, where structureactivity relationships (SAR) were
proposed based on the work of Hansch and Fujita (1964) and Fujita et al. (1964)
and the earlier work of Hammett and Taft. The first textbooks on chemoinformics
have been published recently (Leach and Gillet, 2003; Gasteiger and Engel, 2003;
Gasteiger, 2003; Bajorath, 2004).
The related field bioinformatics was fully established in the 1990s, and has
become an integrated activity in most major pharmaceutical companies. The basis
behind the success of bioinformatics was the access to a vast amount of
experimental data, together with the structured nature of genetic information.
The developments in information technology in the last decades of the 20th
century have fundamentally changed the way in which scientific information is
being communicated and used. A scientific discipline where the impact of these
changes has been particularly significant is (bio)chemistry. Up to less than 25
years ago, molecular modeling was a hardly-existent computational chemistry
niche, only practiced at those few institutes that could afford the very expensive

specialised hardware. Also rapid access to not only the primary literature but,
possibly even more importantly, to the factual primary data about millions of
chemical compounds, to reactions, structures, and spectra, and to the genomic
data of various organisms including humans, can only be provided by digital
storage and retrieval techniques.
The roots of what we now call cheminformatics began very early in the history of
computing: 1950's for statistical models, 1960's for first computer
representations, mainly by curious chemists. However the term
"cheminformatics" wasn't adopted until the early 1990's (the spelling of this cheminformatics or chemoinformatics - is still in dispute). The bulk of the
foundational work was done in the 70's and 80's, and was strongly supported by
the pharmaceutical industry and the need for computational drug discovery
research.
The term chemoinformatics was defined by F.K. Brown in 1998. With the advent
of computers and the ability to store and retrieve chemical information, serious
efforts to compile relevant databases and construct information retrieval systems
began. One of the first efforts to have substantial long term impact was to collect
crystal structure information for small molecules by Olga Kennard.
Chemoinformatics is the mixing of those information resources to transform data
into information and information into knowledge for the intended purpose of
making better decisions faster in the area of drug lead identification and
optimization. Since then, both spellings have been used, and some have evolved
to be established as Cheminformatics, while European Academia settled in 2006
for Chemoinformatics.
The first, and still the core, journal for the subject, The Journal of Chemical
Documentation, started in 1961( the name changed to The Journal of
Information and Computer Science in 1975)
The first book appeared in 1971 (Lynch, Harrison, Town and Ash, Computer
Handling of Chemical Structure Information)

The first international conference on the subject was held in 1973 at


Noordwijkerhout and every three years since 1987.

HOW CAN CHEMOINFORMATICS HELP ??


Cheminformatics can help chemists and other scientists produce and
manage information. In silico analysis using cheminformatics
techniques can actually reduce the risks of developing a drug. Such
techniqes as virtual screening, library design, and docking figure into
the analysis. Physical properties that might have an impact on whether
a substance could potentially be developed as a drug are often
examined in cheminformatics as features that can be compared among
large numbers of substances. An example is clogP, a measure of the
amount of fattiness in the system. Sometimes, inferences can be drawn
about a related set of properties, as when Chris Lipinski formulated his
now famous Rule of Five that says that compounds which are drug-like
tend to have 5 or fewer hydrogen donor atoms, 10 or fewer hydrogen
acceptor atoms, calculated logP less than or equal to 5, and molecular
weight up to 500. Compounds that exhibit greater than these values
tend to have poor absorption or permeation.

CURRENT STATUS OF
CHEMOINFORMATICS
In recent years, there has been an explosion in the availability of publicly
accessible chemical information, including chemical structures of small molecules,
structure-derived properties and associated biological activities in a variety of
assays. These data sources present us with a significant opportunity to develop
and apply computational tools to extract and understand the underlying
structure-activity relationships. Furthermore, by integrating chemical data
sources with biological information (protein structure, gene expression and so
on), we can attempt to build up a holistic view of the effects of small molecules in
biological systems. Equally important is the ability for non-experts to access and
utilize state of the art cheminformatics method and models.
The chemoinformatics field continues to evolve at the interface between
computer science and chemistry. Chemical information and computational
approaches in pharmaceutical research are major focal points of
chemoinformatics. However, the boundaries of this discipline are rather fluid and
the chemoinformatics spectrum is difficult to delineate.
In the area of methodology development, recent work on characterizing
structure-activity landscapes, Quantitative Structure Activity Relationship (QSAR)

model domain applicability and the use of chemical similarity in text mining has
been done. In the area of infrastructure, a distributed web services framework
that allows easy deployment and uniform access to computational (statistics,
cheminformatics and computational chemistry) methods, data and modelshas
been done. The development of PubChem derived databases and highlight
techniques allow us to scale the infrastructure to extremely large compound
collections, by use of distributed processing on Grids.

SCOPE OF CHEMOINFORMATICS

Representation and structure searching


Substructure searching
Similarity searching, clustering Diversity analysis
Searching databases
Computer-aided structure elucidation
3-D substructure searching
QSAR and Docking

APPLICATIONS OF CHEMOINFORMATICS
Storage and retrieval
The primary application of cheminformatics is in the storage, indexing and search
of information relating to compounds. The efficient search of such stored
information includes topics that are dealt with in computer science as data

mining, information retrieval, information extraction and machine learning.


Related research topics include:
Unstructured data
Information retrieval
Information extraction
Structured Data Mining and mining of Structured data
Database mining
Graph mining
Molecule mining
Sequence mining
Tree mining
Digital libraries
File formats
The in silico representation of chemical structures uses specialized formats such
as the XML-based Chemical Markup Language or SMILES. These representations
are often used for storage in large chemical databases. While some formats are
suited for visual representations in 2 or 3 dimensions, others are more suited for
studying physical interactions, modeling and docking studies.
Virtual libraries
Chemical data can pertain to real or virtual molecules. Virtual libraries of
compounds may be generated in various ways to explore chemical space and
hypothesize novel compounds with desired properties.
Virtual libraries of classes of compounds (drugs, natural products, diversityoriented synthetic products) were recently generated using the FOG (fragment
optimized growth) algorithm. [9] This was done by using cheminformatic tools to

train transition probabilities of a Markov chain on authentic classes of


compounds, and then using the Markov chain to generate novel compounds that
were similar to the training database.
Virtual screening
In contrast to high-throughput screening, virtual screening involves
computationally screening in silico libraries of compounds, by means of various
methods such as docking, to identify members likely to possess desired properties
such as biological activity against a given target. In some cases, combinatorial
chemistry is used in the development of the library to increase the efficiency in
mining the chemical space. More commonly, a diverse library of small molecules
or natural products is screened.

Quantitative structure-activity relationship (QSAR)


This is the calculation of quantitative structure-activity relationship and
quantitative structure property relationship values, used to predict the activity of
compounds from their structures. In this context there is also a strong
relationship to Chemometrics. Chemical expert systems are also relevant, since
they represent parts of chemical knowledge as an in silico representation.

MORE ABOUT CHEMOINFORMATICS


Implementing, handling and searching chemical databases is a crucial aspect of
chemoinformatics . Chemical database techniques and data mining methods will
improve as this field evolves, also due to more implementation of new data
structures . Methods for full text data mining are likely to be become very
powerful in the years to come, and will presumably play a highly important role in
the general area of chemoinformatics.
Figure 1 shows the number of references to the words bioinformatics,
chemoinformatics, chemogenomics and metabonomics in PubMed from 1992
to 2004. It is seen that the present trend for chemoinformatics resembles the

trend in bioinformatics five to ten years ago. It should be mentioned that this
graph is based on one database only, PubMed, and is intended to give an idea
about the development in publishing frequency in these areas, and not as a
complete overview.

RECENT ADVANCES IN THE AREA OF DRUG DESIGNING WITH


THE HELP OF CHEMOINFORMATICS
It is clear that the drug discovery and optimization process is
undergoing very significant changes. Many more hits are found than
previously, especially due to the advances gained in combinatorial
chemistry and high throughput screening (HTS). The approach used in
drug discovery has been linear with respect to various relevant
properties, but more parallel approaches are evolving, where not only
the potency (activity) and selectivity of the lead is examined at an early
stage, but also other key properties. Many of the compounds drawn
out of combinatorial libraries may look promising at first, but they fail
at later stages in the drug discovery process due to undesired
properties. A compound can for example be feasible based on
molecular structure, but due to aggregation, limited solubility or limited
uptake in the human organism it is not useful as a drug. Many
pharmaceutical companies might even be repeating the same mistakes,
due to these problems. Methods for assessing these properties at a

very early stage, both experimentally and computationally, are thus


highly desirable. This is expected to lower the cost of drug discovery
and optimization significantly, and hopefully provide an increased
number of useful leads. In cases where a lead has sufficiently high
activity, but various properties need to be improved, chemoinformatics
methods could be used to modify substructures within the lead space
with minimal effect on the activity profile.

Likewise, various computational methods are evolving rapidly at present.


Computational techniques used to search through chemical libraries and
databases, so-called virtual screening methods, have become increasingly popular
in drug discovery. A whole range of computational techniques are used for
searching for molecular similarities and dissimilarities, for extracting information
about pharmacophores (structural models of targets or binding sites) from
compound libraries, for prediction of properties, for studying molecular
interactions at the atomic level, among other things Chemoinformatics is strongly

linked to computational chemistry and molecular modeling. Molecular modeling


methods are particularly useful for conducting conformational analysis of
molecules, and for accessing the strength of intermolecular interactions.
Newly established fields like chemogenomics (or chemical genomics),
metabonomics and metabolomics also play increasingly important roles in
modern drug discovery and development. Chemogenomics (Browne et al., 2002)
deals with interactions between chemical compounds and living systems in terms
of induced genomic response. In metabonomics (Nicholson and Wilson, 2003)
relatively low-molecular weight materials produced during genomic expression
within a cell are studied, normally by use of1 H-NMR spectroscopy and
multivariate data analysis (chemometrics) (Geladi and Kowalski 1986b,a). It has
been shown to be a useful tool for understanding drug efficacy and toxicity.
Metabolomics is similar to metabonomics, but where metabonomics deals with
integrated, multicellular, biological systems, metabolomics deals with simple cell
systems.

CONCLUSION
Chemoinformatics is a rapidly growing field, with a huge application potential.
Chemoinformatics concerns the gathering and systematic use of chemical
information, and the use of those data to predict the behavior of unknown
compounds in silico.

Vous aimerez peut-être aussi