| Current
Bioinformatics
ISSN: 1574-8936

Current Bioinformatics
Volume 1, Number 1, January 2006
Contents

Editorial Pp. 1
[Editorial in PDF]
Computational Biology and Drug Discovery: From Single-Target
to Network Drugs Pp. 3-13
Alberto Ambesi-Impiombato, and Diego di Bernardo
[Abstract] [Full
Text Article]
Computational Prediction of Functionally Important
Regions in Proteins Pp. 15-23
Florencio Pazos and Jung-Wook Bang
[Abstract] [Full
Text Article]
Theoretical Analysis and Computational Predictions
of Protein Thermostability Pp. 25-32
Angel Mozo-Villiarías and Enrique Querols
[Abstract] [Full
Text Article]
Plant Proteomics Databases: Their Status in 2005
Pp. 33-36
Setsuko Komatsu
[Abstract] [Full
Text Article]
Analysis of Microarray Gene Expression Data
Pp. 37-53
Tuan D. Pham, Christine Wells and Denis I. Crane
[Abstract] [Full
Text Article]
Gene Expression Profile Classification: A Review
Pp. 55-73
Musa H. Asyali, Dilek Colak, Omer Demirkaya and Mehmet
S. Inan
[Abstract] [Full
Text Article]
Rapid methods for comparing protein structures and
scanning structure databases Pp. 75-83
Oliviero Carugo
[Abstract] [Full
Text Article]
Engineering Approaches Toward Biological Information
Integration at the Systems Level Pp. 85-93
W. Jim Zheng
[Abstract] [Full
Text Article]
Multiple Sequence Alignment as a Workbench for Molecular
Systems Biology Pp. 95-104
Julie D. Thompson and Olivier Poch
[Abstract] [Full
Text Article]
Models and Algorithms for Haplotyping Problem
Pp. 104-114
Xiang-Sun Zhang, Rui-Sheng Wang, Ling-Yun Wu and Luonan
Chen
[Abstract] [Full
Text Article]
Abstracts
[Back to top]
Editorial
[Editorial in PDF]
Dear Readers,
The present copy of the journal Current Bioinformatics (CBio)
is the inaugural issue of the journal. Current Bioinformatics
is a review journal which has been started to provide the
scientific community involved in computational molecular/structural
biology with a comprehensive and cohesive coverage on different
topics of fast developing bioinformatics, encompassing the
areas such as computing in biomedicine and genomics, computational
proteomics and systems biology, and metabolic pathway engineering.
Developments in these fields have direct implications on key
issues related to health care, medicine, genetic disorder,
development of agricultural products, renewable energy, environmental
protection, etc. The journal will focus on reviews on knowledge
discovery from biological data, computing in biomedicine and
genomics, computational proteomics and systems biology. So
far no such journal has been available that may provide a
comprehensive coverage with critical assessment of the day-to-day
developments in these topics. The Bentham Science has now
taken this step starting the journal “Current Bioinformatics”,
wherein the leading scientists from all over the world are
invited to contribute the review articles on topics in which
they have expertise. The journal would cover a wide range
of the integration of biology with computer and information
science. The present issue contains ten articles covering
a variety of interesting topics.
The issue starts with an article by Ambesi-Impiombato and
Bernardo on computational biology and drug discovery. Computational
biology and bioinformatics have the potential not only to
speed up the drug discovery process, thus reducing the costs,
but also to change the way drugs are designed. In this review,
authors have focused on the different computational and bioinformatics
approaches that have been proposed and applied to the different
steps involved in the drug development process. In drug design
and drug discovery, the functional features of proteins play
very important roles. In article 2, Pazos and Bang discuss
computational methods for predicting protein functional features,
which can be coupled to the pipelines of genome sequencing
and structure determination. This review focuses on current
in-silico methods for predicting regions in proteins
with some functional importance (catalytic sites, binding
sites, protein interaction regions, etc.) using sequence and/or
three-dimensional structure information. Determining functional
features of a protein experimentally is expensive, time consuming
and difficult to automate. The stability of protein structure
and function at a desired temperature is of crucial importance
and the possibility of maintaining the structure and function
of a protein at a temperature above that of its native state
has been the objective of many researchers ever since mutating
a protein became a relatively easy process. In article 3,
therefore, Mozo-Villiarías and Querol present the most
recent theoretical and computer advances related to the problem
of thermally stabilizing proteins.
Since proteins are the major players in most processes of
living cells, knowledge of the proteome has great relevance
to the study of cells and organisms at the molecular level.
Proteome analysis linked to genome sequence information is
very useful for functional genomics. For its analysis, therefore,
Komatsu presents in article 4, the rice proteome database
and other plant proteome databases with a fruitful discussion.
Similarly, in article 5, Pham et al. discuss several
main research directions and methods in the analysis of microarray
gene expression data. Microarrays provide the biological research
community with tremendously rich, sensitive and detailed information
on gene expression profiles. Related to this theme is an article
by Asyali et al. on gene expression profile classification
(article 6) in which the authors have discussed the class-prediction
and discovery methods that are applied to gene expression
data, along with the implications of the findings.
Databases of three-dimensional macromolecular structures
became so large that fast search tools and comparison methods
were needed and were actually designed. In article 7, Carugo
presents a review on the algorithms that allow fast structure
comparison, particularly suitable to handle large databases,
and should provide a comprehensive picture, useful for the
development and the assessment of novel tools. Our understanding
of biological systems has improved dramatically due to decades
of exploration and has been accelerated further during the
past ten years, mainly due to the genome projects, new technologies
such as microarray, and developments in proteomics. Still,
integrating this knowledge to reconstruct a biological system
in silico has been a significant challenge for biologists,
computer scientists, engineers, mathematicians and statisticians.
In article 8, Zheng discusses engineering approaches towards
biological information integration at the systems level, which
can provide many advantages and capture both the static and
dynamic information of a biological system. Thompson and Poch
present an artitcle (article 9) on multiple sequence alignment
as a workbench for molecular systems biology. In a multiple
sequence alignment, structural and functional data can be
combined with evolutionary information to allow reliable data
validation, consensus predictions and rational propagation
of information from known to unknown sequences.
One of the main topics in genomics is to determine the relevance
of DNA variations with some genetic disease. Single nucleotide
polymorphism (SNP) is the most frequent and important form
of genetic variation which involves a single DNA base. The
values of a set of SNPs on a particular chromosome copy define
a haplotype. Because of its importance in the studies of complex
disease association, haplotyping is one of the central problems
in bioinformatics. In the last article (article 10), Zhang
et al. give an account of the existing models and algorithms
for haplotyping problems, report the recent progresses from
the computational viewpoint, and discuss the future research
trends. I thank all the authors of this issue for their excellent
stimulating contributions and hope that readers will greatly
enjoy reading these articles as I did and that these contributions
will be of great value to the researchers involved in the
area of bioinformatics.
Satya P. Gupta
(Editor-in-Chief)
Department of Chemistry
Birla Institute of Technology and Science
Pilani-333031
India
E-mail: spg@bits-pilani.ac.in
[Back to top]
Computational Biology and Drug Discovery: From Single-Target
to Network Drugs
Alberto Ambesi-Impiombato, and Diego di Bernardo
[Full Text Article]
The drug discovery process is complex, time consuming and
expensive, and includes preclinical and clinical phases. The
pharmaceutical industry is moving from a symptomatic relief
focus towards a more pathology-based approach where a better
understanding of the pathophysiology should help deliver drugs
whose targets are involved in the causative processes underlying
the disease. Computational biology and bioinformatics have
the potential not only to speed up the drug discovery process,
thus reducing the costs, but also to change the way drugs
are designed. In this review we focus on the different computational
and bioinformatics approaches that have been proposed and
applied to the different steps involved in the drug development
process. The development of ‘network-reconstruction’
methods is now making it possible to infer a detailed map
of the regulatory circuit among genes, proteins and metabolites.
It is likely that the development of these technologies will
radically change, in the next decades, the drug discovery
process, as we know it today.
[Back to top]
Computational Prediction of Functionally Important Regions
in Proteins
Florencio Pazos and Jung-Wook Bang
[Full Text Article]
Current projects for the massive characterization of proteomes
are generating protein sequences and, to less extent, three
dimensional structures with unknown function. Experimentally
determining functional features of a protein is expensive,
time consuming and difficult to automate. There is therefore
a demand for computational methods for predicting protein
functional features, which can be coupled to the pipelines
of genome sequencing and structure determination. This review
focuses on current in-silico methods for predicting
regions in proteins with some functional importance (catalytic
sites, binding sites, protein interaction regions, etc.) using
sequence and/or three-dimensional structure information.
[Back to top]
Theoretical Analysis and Computational Predictions of Protein
Thermostability
A. Mozo-Villiarías and E. Querol
[Full Text
Article]
The interest in finding the keys to the thermal stabilization
of proteins has remained constant and unquestionable throughout
the last twenty years. This article reviews the most recent
theoretical and computer advances related to the problem of
thermally stabilizing proteins. Although comparison between
mesophilic and thermophilic sequences has suggested some thermostabilization
mechanisms, it has not been able ‘per se’ to provide
unambiguous thermostabilization rules applicable for every
case. Two of the mechanisms used by nature are seen as the
major factors governing thermostability: the electrostatic
forces of charged amino acids within a protein and the packing
of its hydrophobic core on the other. Other mechanisms that
have also been implicated (i.e. hydrogen bonding, α-helix
stabilization, backbone rigidifying, etc), may play a refining
role, based on the principle that nature has punctually and
opportunistically thermostabilized proteins in each particular
case, thereby solving each specific problem. How electrostatic
and hydrophobic forces affect each other is still remains
a largely open question and some recently developed criteria
based on these two effects have been analyzed in the review.
[Back to top]
Plant Proteomics Databases: Their Status in 2005
Setsuko Komatsu
[Full Text Article]
Proteome analysis linked to genome sequence information is
very useful for functional genomics. Since proteins are the
major players in most processes of living cells, knowledge
of the proteome has great relevance to the study of cells
and organisms at the molecular level. The technique of proteome
analysis using two-dimensional polyacrylamide gel electrophoresis
(2D-PAGE) has the power to monitor global changes that occur
in the protein complement of tissues and subcellular compartments.
As a complement to more focused studies, and to facilitate
further advances in functional genomics, several databases
based on 2D-PAGE are already available including those for
plants. In this review, the rice proteome database and other
plant proteome databases are discussed. Organizing and streamlining
the access of information into plant proteome databases, especially
the rice proteome database, will aid in cloning the genes
for and predicting the function of unknown proteins.
[Back to top]
Analysis of Microarray Gene Expression Data
Tuan D. Pham, Christine Wells and Denis I. Crane
[Full Text Article]
Microarrays provide the biological research community with
tremendously rich, sensitive and detailed information on gene
expression profiles. Gene expression profiling and gene expression
patterns have been found useful for solving a wide variety
of important biological and biomedical problems, including
the study of metabolic pathways, inference of the functions
of unknown genes, diagnosis of diseased states, as well as
facilitating the development of individualized drug treatments
through pharmacogenomics. Given the significant impact of
microarray gene expression data in biological and biomedical
research, this breakthrough technology urgently needs the
assistance of advanced computational methods for interpreting
and utilizing the raw information. This paper reviews several
main research directions and methods in the analysis of microarray
gene expression data.
[Back to top]
Gene Expression Profile Classification: A Review
Musa H. Asyali, Dilek Colak, Omer Demirkaya and Mehmet
S. Inan
[Full Text Article]
In this review, we have discussed the class-prediction and
discovery methods that are applied to gene expression data,
along with the implications of the findings. We attempted
to present a unified approach that considers both class-prediction
and class-discovery. We devoted a substantial part of this
review to an overview of pattern classification/recognition
methods and discussed important issues such as preprocessing
of gene expression data, curse of dimensionality, feature
extraction/selection, and measuring or estimating classifier
performance. We discussed and summarized important properties
such as generalizability (sensitivity to overtraining), built-in
feature selection, ability to report prediction strength,
and transparency (ease of understanding of the operation)
of different class-predictor design approaches to provide
a quick and concise reference. We have also covered the topic
of biclustering, which is an emerging clustering method that
processes the entries of the gene expression data matrix in
both gene (row) and sample (column) directions simultaneously,
in detail.
[Back to top]
Rapid methods for comparing protein structures and scanning
structure databases
Oliviero Carugo
[Full Text Article]
Databases of three-dimensional macromolecular structures
became so large that fast search tools and comparison methods
are needed and were actually designed. All of them employ
simplified representations of the three-dimensional structure:
strings of characters of variable length, which can be handled
with procedures that were designed for sequence analysis;
fixed dimension arrays that can be processed with standard
statistical methods; ensembles of secondary structural elements,
which are much less numerous than the atoms/residues of the
protein; continuous representations of the backbone, through
stereochemical figures. Some of these computational procedures
were developed long ago, when computers were too slow, and
others have been designed recently, with the specific aim
of handling large amount of information. The present article
is focused on the algorithms that allow fast structure comparison,
particularly suitable to handle large databases, and should
provide a comprehensive picture, useful for the development
and the assessment of novel tools.
[Back to top]
Engineering Approaches Toward Biological Information Integration
at the Systems Level
W. Jim Zheng
[Full Text Article]
Our understanding of biological systems has improved dramatically
due to decades of exploration. This process has been accelerated
even further during the past ten years, mainly due to the
genome projects, new technologies such as microarray, and
developments in proteomics. These advances have generated
huge amounts of data describing biological systems from different
aspects. Still, integrating this knowledge to reconstruct
a biological system in silico has been a significant
challenge for biologists, computer scientists, engineers,
mathematicians and statisticians. Engineering approaches toward
integrating biological information can provide many advantages
and capture both the static and dynamic information of a biological
system. Methodologies, documentation and project management
from the engineering field can be applied. This paper discusses
the process, knowledge representation and project management
involved in engineering approaches used for biological information
integration, mainly using software engineering as an example.
Developing efficient courses to educate students to meet the
demands of this interdisciplinary approach will also be discussed.
[Back to top]
Multiple Sequence Alignment as a Workbench for Molecular
Systems Biology
Julie D. Thompson and Olivier Poch
[Full Text Article]
Recent progress in experimental techniques such as high-throughput
genome sequencing, proteomics, transcriptomics and interactomics
have lead to a new demand for integrated computational analyses,
capable of systematically organizing these heterogeneous,
fragmentary data into a coherent whole. As a consequence,
novel system-level bioinformatics solutions are now being
developed with the goal of understanding and predicting the
behaviour of complex systems, such as molecular pathways,
cells, tissues, organs and even whole organisms. Multiple
alignments of both nucleotide and protein sequences play a
central role in many of these applications, which range from
the identification of genes and their products, via
the characterisation of their 3D structure and their molecular
and cellular functions, to the prediction of the phenotypic
consequences of mutations, reverse engineering and drug design.
In a multiple sequence alignment, structural and functional
data can be combined with evolutionary information to allow
reliable data validation, consensus predictions and rational
propagation of information from known to unknown sequences.
Clearly, integration at this scale calls for high quality,
automatic multiple alignments. Alignment techniques are now
responding to the challenge, with current developments moving
away from a single all-encompassing algorithm towards more
co-operative, knowledge based systems. However, the success
of these methods relies on the efficient integration of information
from different databases and the close cooperation of the
different data mining and investigation algorithms. A large
community effort is now underway to develop standards for
data exchange and organisation that will facilitate collaborations
between the various resources, in order to support improved
domain understanding and to provide better decision-making
systems and services for the biologist.
[Back to top]
Models and Algorithms for Haplotyping Problem
Xiang-Sun Zhang, Rui-Sheng Wang, Ling-Yun Wu and Luonan
Chen
[Full Text Article]
One of the main topics in genomics is to determine the relevance
of DNA variations with some genetic disease. Single nucleotide
polymorphism (SNP) is the most frequent and important form
of genetic variation which involves a single DNA base. The
values of a set of SNPs on a particular chromosome copy define
a haplotype. Because of its importance in the studies of complex
disease association, haplotyping is one of the central problems
in bioinformatics. There are two classes of in silico
haplotyping problems, i.e., single individual haplotyping
and population haplotyping. In this review paper, we give
an overview on the existing models and algorithms on this
topic, report the recent progresses from the computational
viewpoint and further discuss the future research trends.
|