Current
Computer-Aided Drug Design
ISSN: 1573-4099

Current Computer-Aided
Drug Design
Volume 1, Number 2, April 2005
Contents

Novel Computational Approaches in QSAR and Molecular Design
Based on GA, Multi-Way PLS and NN Pp. 129-145
Kiyoshi Hasegawa, Masamoto Arakawa and Kimito
Funatsu
[Abstract]
Post-Genomic Design of Bioactive Molecules Pp.
147-162
Martin G. Grigorov
[Abstract]
Variable Subset Selection in the Presence of Flagged
Observations and Multicollinear Descriptors in QSAR Pp.
163-177
Peter P. Mager and Luis Sanchez
[Abstract]
Predicting ADMET Properties by Projecting onto Chemical
Space-Benefits and Pitfalls Pp. 179-193
Hongmao Sun
[Abstract]
Assessing QSAR Limitations A Regulatory Perspective
Pp. 195-205
Weida Tong, Huixiao Hong, Qian Xie, Leming Shi, Hong
Fang and Roger Perkins
[Abstract]
Computer Design of Vaccines: Approaches, Software Tools
and Informational Resources Pp. 207-222
Boris N. Sobolev, Ludmila V. Olenina, Ekaterina F.
Kolesanova, Vladimir V. Poroikov and Alexander I. Archakov
[Abstract]
Abstracts
[Back to top]
Novel Computational Approaches in QSAR and Molecular Design
Based on GA, Multi-Way PLS and NN
Kiyoshi Hasegawa, Masamoto Arakawa and Kimito
Funatsu
QSAR and subsequent molecular design are very important steps
in drug discovery. Through QSAR, one derives a model that
relates a set of molecular descriptors to a biological activity.
The resulting model can be used to predict the activity values
of new compounds in molecular design. QSAR models range from
simple, parametric equations to complex, non-linear models.
These models have each specific advantage and shortcoming
derived from their own algorithms. We have developed hybrid
approaches combining GA, multiway PLS and NN to utilize specific
advantage and to cover specific shortcoming of each method.
We have picked up five topics and outlined with the representative
examples in this review article.
[Back to top]
Post-Genomic Design of Bioactive Molecules
Martin G. Grigorov
This article is presenting the most recent trends in the
ways that the bioactive molecules of the future will be designed.
They rely on important recent discoveries for our understanding
of the global organization of the cell, giving evidence that
biological networks form small-world and scale-free structures.
These networks are composed of well-defined modules, with
nodes connected by relatively short paths that allow for fast
signaling. The few very connected nodes, that are unlikely
to be affected by random alterations, support the proper functioning
of the whole system. The fact that in cells everything is
connected to everything explains why monogenic diseases, associated
to the alteration of individual genes, were found to be an
exception rather than a rule. The newly developed chemogenomic
technology is offering an alternative to the traditional animal-centric
pharmacological approach in the need to evaluate bioactive
molecules efficacy on intact biological systems, where the
multiple targets and pathways reside in their natural environment.
The existence of regulatory and interaction neural centres
or hubs in these networks is setting new perspectives to target
identification and validation. With these new technologies
at hand, we are entering an exciting new era where the pharmacological
targets will shift from single proteins, to functional protein
complexes, to whole networks determining precise cellular
states, and where the new cures and foods will be no more
based on single active ingredients but will represent molecular
cocktails or multiple ligands with components targeting the
neural centers of whole disease-associated molecular networks.
[Back to top]
Variable Subset Selection in the Presence of Flagged Observations
and Multicollinear Descriptors in QSAR
Peter P. Mager and Luis Sanchez
A major problem in traditional quantitative structure-activity
relationships (QSARs) analysis is to select suitable chemical
descriptors from a large pool of variables. Decisions against
or in favor of a particular descriptor depends entirely on
the result of statistically based hypothesis testing. Uncertain
results may be produced in presence of multicollinear descriptors
and flagged observations (high-leverage points, outliers,
influential data). To satisfy the assumptions for hypothesis
testing, diagnostic statistics and subsequent design repair
are employed. Here we show an example with nonnucleoside HIV-1
reverse transcriptase inhibitors.
[Back to top]
Predicting ADMET Properties by Projecting onto Chemical
Space-Benefits and Pitfalls
Hongmao Sun
The mechanisms behind ADME (absorption, distribution, metabolism,
and excretion) related properties and toxicity endpoints are
usually complex, and many are not fully understood. As a result,
most ADMET predictive models are not based on theoretical
principles, but are derived from experimental data. ADMET
properties are best analyzed by projecting them onto the compounds
of the training set. There are multiple advantages to projecting
the ADMET properties from the problem domain to the chemical
domain. Projection simplifies the problem, and avoids the
entanglement of needing to invoke specific mechanisms. Projection
focuses on the most important, and most tractable, aspect
of the problem -- the related properties of the compounds
themselves. In this review article, the general requirements
of the chemical space to be projected are discussed, including
the size and diversity of the training set and the accuracy
of the biological measurements, and the process is illustrated
using an analogue of a real projection. Also, the successes
and pitfalls of the projection method in recent ADMET predictions
are reviewed.
[Back to top]
Assessing QSAR Limitations A Regulatory Perspective
Weida Tong, Huixiao Hong, Qian Xie, Leming Shi, Hong
Fang and Roger Perkins
Wider acceptance of QSARs would result in a constellation
of benefits and savings to both private and public sectors.
For this to occur, particularly in regulatory applications,
a models limitations need to be identified. We define
a models limitations as encompassing assessment of overall
prediction accuracy, applicability domain and chance correlation.
A general guideline is presented in this review for assessing
a models limitations with emphasis on and examples of
application with consensus modeling methods. More specifically,
we discuss the commonalities and differences between external
validation and cross-validation for assessing a models
limitations. We illustrate two common ways of assessing overall
prediction accuracy, depending on whether or not the intended
application domain is predefined. Since even a high quality
model will have different confidence in accuracy for predicting
different chemicals, we further demonstrate using the novel
Decision Forest consensus modeling method a means to determine
prediction confidence (i.e., certainty for an individual chemicals
prediction) and domain extrapolation (i.e., the prediction
accuracy for a chemical that is outside the chemistry space
defined by the training chemicals). We show that prediction
confidence and domain extrapolation are related measures that
together determine the applicability domain of a model, and
that prediction confidence is the more important measure.
Lastly, the importance of assessing chance correlation is
emphasized, and illustrated with several examples of models
having a high degree of chance correlations despite cross-validation
indicating high prediction accuracy. Generally, a dataset
with a skewed distribution, small data size and/or low signal/noise
ratio tends to produce a model with high chance correlation.
We conclude that it is imperative to assess all three aspects
(i.e., overall accuracy, applicability domain and chance correlation)
of a model for the regulatory acceptance of QSARs.
[Back to top]
Computer Design of Vaccines: Approaches, Software Tools
and Informational Resources
Boris N. Sobolev, Ludmila V. Olenina, Ekaterina F.
Kolesanova, Vladimir V. Poroikov and Alexander I. Archakov
Development of computer methods in molecular biology and
fast growth of microbial genomics data enabled new approach
based on selecting in silico antigenic components
to design vaccine constructs. It is expected that application
of this technology will eliminate side effects of new vaccines
and reduce the time consumption and financial expenses. The
bioinformatics methods of sequence analysis are used to reveal
the most prospective proteins or protein fragments of infectious
agents as candidates for vaccine design. In these studies
the specialized molecular immunology databases are widely
used. The new approach ("Reverse vaccinology") could
help in designing vaccines against diseases where traditional
methods are not successful, e.g. when the viral genome reveals
the extreme variability and permanent changes of antigenic
properties that make difficulties for selection of molecular
targets for medicines and candidate vaccines. A number of
informational resources are already designed to collect and
provide genomic data on certain microbes or viruses. The peculiarity
of such resources is presentation of data, characterizing
the different genomic variants of the same infectious agents.
These structural data coupled with information on functional/immune
features and software tools have to compose basis for constructing
a new generation of vaccines against "common" and
new infections such as AIDS, Hepatitis C, and SARS. The approaches
published in literature, as well as the authors original
results are discussed.
|