Current
Computer-Aided Drug Design
ISSN: 1573-4099
Current Computer-Aided
Drug Design
Volume 2, Number 2, June 2006
Contents
Structuring Chemical Information for Quicker and More
Reliable Drug Safety Assessment
Guest Editor: Romualdo Benigni

Editorial Pp. 93-94
In Silico Technology for Identification of Potentially
Toxic Compounds in Drug Discovery Pp. 95-103
R. Didziapetris, D. P. Reynolds, P. Japertas, D.
Zmuidinavicius and A. Petrauskas
[Abstract]
Mini–Review on Chemical Similarity and Prediction
of Toxicity Pp. 105-122
A. Gallegos Saliner
[Abstract]
Artificial Intelligence and Data Mining
for Toxicity Prediction Pp. 123-133
C. Helma and J. Kazius
[Abstract]
The Art of Data Mining the Minefields of Toxicity
Databases to Link Chemistry to Biology Pp. 135-150
C. Yang, A. M. Richard and K. P. Cross
[Abstract]
Computational Methods to Predict Drug Safety
Pp. 151-168
G. Patlewicz
[Abstract]
Structural Alerts of Mutagens and Carcinogens
Pp. 169-176
R. Benigni and C. Bossa
[Abstract]
In Silico Metabolism Studies in Drug Discovery: Prediction
of Metabolic Stability Pp. 177-188
V. K. Gombar, J. J. Alberts, K. C. Cassidy, B. E. Mattioni
and M. A. Mohutsky
[Abstract]
A Signal Analysis Approach Applied to the Study of
Sequence, Structure and Function of the Proteins
Pp. 189-201
R. Benigni, A. Giuliani, J.P. Zbilut, S.W. Ellis and D.
Allorge
[Abstract]
Abstracts

[Back to top]
Editorial
The removal of a pharmaceutical drug from the market because
of unexpected adverse reactions is one of the most dramatic
events that may take place during the long process ranging
from design to marketing. Several drugs have been withdrawn,
or their use subjected to serious restrictions because of
various toxicity problems (e.g., valvular heart disease, liver
failure, ischemic colitis, and torsade de points) not recognized
during pre-clinical and clinical experimentation. The dimension
of the problem is impressive: the toxic effects from marketed
drugs, even when used appropriately, are estimated to rank
among the top ten causes of death in the United States. For
this reason, methods that can predict toxic effects at the
early stages of drug development are urgently needed.
One approach is to exploit the wealth of chemical knowledge
to build structure-activity based predictive models for toxicity.
As a matter of fact, the contribution offered by the treatment
of molecules through computers is much wider than that can
be perceived by looking only at the classical structure-activity
relationships models. The chemical structure as a chemical
identifier has a universally understood meaning and scientific
relevance. Chemical structure and chemical concepts (e.g.
reactive functional groups, acidity, hydrophobicity, electrophilic
reactivity, and free radical formation) provide a common language
and framework for exploring the underlying chemical reactivity
bases for diverse toxicological outcomes. Hence, chemical
structure should be considered an essential identifier and
a scientifically useful metric for chemical toxicity databases.
Effective linkage of toxicity data with chemical structure
information can facilitate and greatly enhance data gathering
and hypothesis generation in conjunction with (Q)SAR modeling
efforts. This hot topic issue is aimed at providing an insight
into the wider landscape of the computerized treatment of
molecules for toxicity prediction.
Specific to this hot topics issue is the strong belief that
one of the most serious problems that hampers the progress
of science is the separation between different areas of knowledge
and different disciplines. Often, progress in one area is
not known to scientists who deal with the same problem, but
belong to another discipline. For example, parallel work has
been done to build models for predicting the toxicity of pharmaceutical
drugs and that of environmental chemicals, with little mutual
benefit. Thus in this issue, a special effort has been made
to gather contributions from both fields, and to bridge the
gap between the two disciplines and to cross-fertilize.
In the first mini-review, Remigius Didziapetris et al.
overview historic developments and practical implications
of property based drug design. The emergence of virtual screening
to remove undesirable compounds from consideration prior to
their synthesis or acquisition is outlined, and several in
silico tools are described. Critical issues on the use
of in silico approaches are discussed, and future
developments are suggested.
In her mini-review, Ana Gallegos presents both theoretical
insights and practical applications relative to one of the
basic concepts on which the concept of Structure-Activity
Relationships relies, i.e., chemical similarity. The paper
shows how the heuristic and subjective process of establishing
similarities and analogies, when applied to molecules, has
produced very diverse and sophisticated formalized treatments.
The paper emphasizes that the use of quantitative similarity
measures for toxicity modeling and prediction is highly context-dependent,
and needs to account for each specific activity or toxicity.
The mini-review by Christoph Helma and Jeroen Kazius is on
the subject of Artificial Intelligence in data mining and
toxicity prediction. It provides a conceptual description
of the most important data mining algorithms for the identification
of chemical features and the extraction of relationships between
these descriptors and toxic activities. Among others, the
paper discusses critically the rapidly expanding field of
chemical structure representation (including algorithms for
substructure searching). Special emphasis is given to the
validation procedures for (Q)SAR models.
Chihae Yang et al. review the twin concepts of data
bases and data mining. The purposes of toxicity data bases
range from a source for (Q)SAR datasets for modelers to a
basis for “read-across” for regulators. The tasks
involved in the use of data bases are closely tied to data
mining, thus database and data mining are essential technology
pairs. This mini-review puts particular emphasis on the close
relationship and inter-dependence between the concepts of
data base and data mining. Whereas structure data mining is
similar to that conventionally employed for large chemical
databases, data mining of toxicity endpoints is still not
well developed. Moreover, possible lines of development are
suggested, and practical examples are provided. Particularly
stressed is the need for the databases to be rigorously modeled
using standards and controlled vocabulary.
Grace Patlewicz reviews computational methods to predict drug
safety. Focus is placed on several endpoints relevant to drug
toxicity, such as ADME properties as well as mutagenicity
and carcinogenicity. The modeling systems are put in a more
general context, and a strategy for the use of non-testing
approaches is outlined. This mini-review also discusses concepts
which have been developed in different toxicological areas
but may be of interest to the purpose of drug safety (e.g.,
Chemical Categories, and Threshold of Toxicological Concern).
Specifically focusing on the field of chemical toxicity is
the paper by Benigni and Bossa, that deals with the discovery
of the Structural Alerts for chemical mutagens and carcinogens.
This is an area where mechanistic research and human ingenuity
have opened the way to a comprehensive, and still largely
valid, theory on chemical carcinogenesis. Recent attempts
have tried to code this body of evidence into machine-readable
languages and exploiting these modern approaches to refine
and expand the knowledge on chemical carcinogens.
Vijay K. Gombar et al. review metabolic fate and
metabolic stability prediction in the context of developing
new drugs. A background on drug metabolic studies, and metabolism
experimental techniques is provided. Experimental determination
of ADME characteristics is not practical for large numbers
of compounds; therefore, focus is centered on bringing in
silico approaches earlier in the discovery process to
assess metabolic fate and stability solely from molecular
structure. The paper reviews a number of metabolism in
silico tools and models that have potential applications
in drug discovery. The paper describes a step-by-step process
to construct and deploy reliable in silico metabolic
stability and other ADME screens.
The last paper by Benigni et al. is also related
to the issue of modeling metabolism. It describes a new approach
to the modeling of protein sequences (called Recurrence Quantification
Analysis), and critically evaluates potentialities and pitfalls.
In particular, the paper includes a study on the polymorphisms
of the enzyme P450 2D6, which play a crucial role in the metabolism
of a large range of pharmaceutical drugs.
In summary, this hot topics issue focused on the problems
facing the pharmaceutical industry, given the need for faster
and higher throughput toxicity testing, and how emerging technologies
are being applied to address this need. We have collected
several papers on the available in silico systems
and how these are being developed and refined with the increasing
need for computer-based screening. The role of chemical structure,
and of chemical structure codes, as a practical and scientific
identifier has been particularly emphasized. It is hoped that
this volume will provide insight into the in silico
predictive toxicology approaches, and how cooperative and
interdisciplinary work is necessary to further advance this
crucial area of research.
Romualdo Benigni
Istituto Superiore di Sanita’,
Experimental and Computational Carcinogenesis,
Environment and Health Department,
Viale Regina Elena 299, 00161 Rome, Italy
E-mail: rbenigni@iss.it
[Back to top]
In Silico Technology for Identification of Potentially
Toxic Compounds in Drug Discovery
R. Didziapetris, D. P. Reynolds, P. Japertas, D.
Zmuidinavicius and A. Petrauskas
This review gives the background to analysis of toxicity data,
development of predictive algorithms, and applications of
these algorithms in lead selection and optimization. The considered
algorithms predict acute toxicity (Mouse and Rat LD50),
genotoxicity (Ames Test), carcinogenicity, and organ-specific
health effects (based on diverse animal and human studies).
These tools can aid drug design in several ways. Often lead
selection is based on the use of simple molecular properties
(logP, MW, H-bonding) to define either a
‘druglike’ or ‘leadlike’ chemical
space. These definitions need to be supplemented with substructure-specific
considerations that account for variable chemical reactivity,
ionization, and fuzzy-specific interactions with various biological
constituents. The available toxicity predictions can fill
these gaps to a certain extent, by supplementing or replacing
various pre-defined filters of "alert substructures"
that ignore the dependence of chemical reactivity and toxicity
on substituent effects and whole-molecule ADME effects. In
drug discovery these tools can help to prioritize in vitro
measurements and estimate animal toxicity, although multiple
data gaps in their training sets restrict their usefulness.
A partial solution to this problem is calculation of 95% confidence
intervals (or continuous probabilities) that indicate toxicological
similarity of a given compound to the training set. If a compound
is not too dissimilar, “hazard substructures”
can be automatically generated, thus suggesting possible mechanistic
explanations and structural modifications of the lead compound.
The best solution however is to develop new predictive algorithms
based on company-specific data, and there are available analytical
and development software tools that can help to do this. It
is also necessary to continuously improve the existing organ-specific
health effect predictions by adding new data (for existing
and new endpoints) and improving the overall methodology used
in data analysis.
[Back to top]
Mini–Review on Chemical Similarity and Prediction
of Toxicity
A. Gallegos Saliner
The notion of similarity relates to a relative comparison
between different systems. The process of establishing similarities
and analogies by humans is heuristic and subjective. Similarity
is a context dependent and a relative measure. It is only
meaningful to say that x is similar to
y with respect to z. In toxicology
and drug design, it is important to have an objective measure
of similarity to compare two or more chemicals with respect
to their activity or toxicity. Similarity assessment based
on structures is a convenient and popular means of comparison
but needs to account for each specific activity or toxicity.
This mini review starts by providing an overview of the history
and philosophy of similarity in general. It describes the
different means of quantifying chemicals and how these numerical
descriptors can be applied in so-called similarity indices
to compare chemicals with respect to their activity or toxicity.
The use of a varied wealth of similarity indices applied to
the same study case is analyzed and compared throughout.
[Back to top]
Artificial Intelligence and Data Mining for Toxicity
Prediction
C. Helma and J. Kazius
Tools for artificial intelligence and data mining can derive
(Quantitative) Structure-Activity Relationships ((Q)SARs)
for toxicity in an objective and reproducible manner. This
review provides a conceptual description of the most important
data mining algorithms for the identification of chemical
features and the extraction of relationships between these
descriptors and toxic activities. We will discuss the compliance
of these techniques with the OECD guidelines for (Q)SAR requirements
as well as performance implications. Special emphasis will
be given to validation procedures for (Q)SAR models.
[Back to top]
The Art of Data Mining the Minefields of Toxicity
Databases to Link Chemistry to Biology
C. Yang, A. M. Richard and K. P. Cross
Toxicity databases have a special role in predictive toxicology,
providing ready access to historical information throughout
the workflow of discovery, development, and product safety
processes in drug development as well as in review by regulatory
agencies. To provide accurate information within a hypotheses-
building environment, the content of the databases needs to
be rigorously modeled using standards and controlled vocabulary.
The utilitarian purposes of databases widely vary, ranging
from a source for (Q)SAR datasets for modelers to a basis
for “read-across” for regulators. Many tasks involved
in the use of databases are closely tied to data mining, hence
database and data mining are essential technology pairs. To
understand chemically-induced toxicity, chemical structures
must be integrated into the toxicity databases. Data mining
these “structure-integrated toxicity databases”
requires techniques for handling both chemical structures
and textual toxicity information. Structure data mining is
similar with some modifications to that conventionally employed
for large chemical databases, while data mining of toxicity
endpoints is not well developed. This review presents a general
strategy to data mine structure-integrated toxicity databases
to link chemical structures to biological endpoints. Iterative
probing of the chemical domain with toxicity endpoint descriptors
and the biological domain with chemical descriptors enables
linking of the two domains. Data mining steps to elucidate
the hidden relationships between the target organs and chemical
classes are presented as an example. Work is in progress in
the public domain toward the linking of chemistry to biology
by providing databases that can be mined.
[Back to top]
Computational Methods to Predict Drug Safety
G. Patlewicz
This mini review aims to outline some of the non testing
approaches that are available for the purposes of predicting
and assuring drug safety. Focus will be made on several endpoints
of specific concern such as ADME properties as well as mutagenicity
and carcinogenicity. The use of TTC and chemical categories
approaches are presented as alternative strategies. Overall
there is great potential to apply a battery of different tools
in drug discovery from QSARs to TTC and chemical categories.
Greater awareness of other initiatives (in parallel industries)
coupled with more practical guidance on how to exploit these
tools is still required before they become embedded into routine
use.
[Back to top]
Structural Alerts of Mutagens and Carcinogens
R. Benigni and C. Bossa
This paper summarizes the evidence on the Structural Alerts
of mutagenicity and carcinogenicity. The Structural Alerts
are molecular substructures or reactive groups that are related
to the carcinogenic and mutagenic properties of the chemicals,
and represent a sort of “codification” of a long
series of studies aimed at highlighting the mechanisms of
action of the mutagenic and carcinogenic chemicals. The identification
of the Structural Alerts has had a great value both in terms
of understanding mechanisms, and of assessing the risk posed
by chemicals. This mini-review illustrates a number of case
studies where the Structural Alerts have played a fundamental
role in risk assessment, and describes recent work aimed at
expanding or refining the knowledge on the Structural Alerts
through the use of Artificial Intelligence and Data Mining
approaches.
[Back to top]
In Silico Metabolism Studies in Drug Discovery:
Prediction of Metabolic Stability
V. K. Gombar, J. J. Alberts, K. C. Cassidy, B. E. Mattioni
and M. A. Mohutsky
The strategy to screen compounds solely for pharmacological
potency and selectivity in the early stages of drug discovery
brought the pharmaceutical industry to face the stark reality
of disproportionate attrition later in the development stage
due to poor drug disposition characteristics. This attrition
contributed to the exorbitant costs of discovering and developing
drugs. Considering ADME (Absorption, Distribution, Metabolism,
and Excretion) characteristics of compounds early in the discovery
process can wisely direct resources to compounds that have
greater potential to survive the clinical trial stages of
drug development. However, experimental determination of ADME
characteristics is not practical for large numbers of compounds.
Therefore, focus is being centered on bringing in silico
approaches earlier in the discovery process to assess ADME
properties solely from molecular structure. Given that metabolism
is one of the most important of the ADME properties, in this
paper we review a number of metabolism in silico
tools and models that have potential applications in drug
discovery. We then describe a step-by-step process, as practiced
in our laboratories, to construct and deploy reliable in
silico metabolic stability and other ADME screens. Additionally,
we give examples of the application of our metabolic stability
in silico screens in scaffold selection, ADME space
enrichment, and rationalizing synthesis and testing of compounds
in the drug discovery process. Agreements between the experimental
and in silico metabolic stability values ranging
from 84% to 100% have convinced many discovery project teams
to routinely use these in silico models. Finally,
we present our ideas on the successful implementation of in
silico models and tools for significant impact on drug
discovery and development.
[Back to top]
A Signal Analysis Approach Applied to the Study of
Sequence, Structure and Function of the Proteins
R. Benigni, A. Giuliani, J.P. Zbilut, S.W. Ellis and D.
Allorge
Computational chemistry is largely based on the use of
quantitative descriptors of organic molecules, allowing for
the analysis of large molecular data sets and for building
models that link the chemico-physical and structural descriptions
of molecules to their biological activity or chemical reactivity.
In the case of the proteins, this approach is severely hampered
by the need to take into consideration in a meaningful way
the actual sequence of the aminoacid residues. From a purely
mathematical perspective, the protein sequences can be viewed
as a time series, where the role of time is played by the
order of the aminoacid residues along the sequences. In turn,
each individual residue can be considered as a single organic
molecule that can be represented by the classical molecular
descriptors. Thus, in principle the generation of order-dependent
synthetic descriptors through the application of time series
analysis can be used for building QSAR-like models of proteins.
As a matter of fact, Recurrence Quantification Analysis (RQA)
of hydrophobicity-coded sequences of proteins has already
been demonstrated to be useful in protein science. In this
paper, we show merits and pitfalls of RQA in different case
studies, ranging from the global description of a large set
of diverse proteins, to the study of the effect of mutations
in the human cytochrome P450 system.
|