| Current
Bioinformatics
ISSN: 1574-8936

Current Bioinformatics
Volume 1, Number 2, May 2006
Contents

Recent Advances in RNA Secondary Structure Prediction
with Pseudoknots Pp. 115-129
Tatsuya Akutsu
[Abstract]
Computational Analyses of Ancient Polyploidy Pp.
131-146
Kevin P. Byrne and Guillaume Blanc
[Abstract]
Principles and Practices of Pathway Modelling Pp.
147-160
Karthik Raman, Preethi Rajagopalan and Nagasuma Chandra
[Abstract]
Is There a Real Bayesian Revolution in Pattern Recognition
for Bioinformatics? Pp. 161-165
Jukka Corander
[Abstract]
Intervention in Probabilistic Gene Regulatory Networks
Pp. 167-184
Aniruddha Datta, Ranadip Pal and Edward R. Dougherty
[Abstract]
Accomplishments and Challenges in High Performance
Computing for Computational Biology Pp. 185-195
Zhihua Du, Feng Lin and Bertil Schmidt
[Abstract]
Mining Protein-Protein Interaction Data Pp.
197-205
Ryan J. Haasl and Jianwen Fang
[Abstract]
Computational and Statistical Methods to Explore the
Various Dimensions of Protein Evolution Pp. 207-217
Mario A. Fares
[Abstract]
Networks Everywhere? Some General Implications of
an Emergent Metaphor Pp. 219-234
Maria Concetta Palumbo, Lorenzo Farina, Alfredo Colosimo,
Kyaw Tun, Pawan K. Dhar and Alessandro Giuliani
[Abstract]
Towards a Phenotypic Semantic Web Pp. 235-246
Georgios V. Gkoutos
[Abstract]
Large Scale Protein Sequence Clustering - Not Solved
But Solvable Pp. 247-254
Antje Krause
[Abstract]
Software Analysis of Two-Dimensional Electrophoretic
Gels in Proteomic Experiments Pp. 255-262
Martin H. Maurer
[Abstract]
Abstracts
[Back to top]
Recent Advances in RNA Secondary Structure Prediction
with Pseudoknots
Tatsuya Akutsu
It has recently been recognized that pseudoknots of RNAs
have important roles and not a small number of RNAs contain
pseudoknots. Therefore, recent studies on RNA secondary structure
prediction focus on pseudoknots. Several algorithms have been
developed based on dynamic programming. Though optimality
of a solution is guaranteed, these algorithms suffer from
high time complexities. Thus, heuristic algorithms have also
been developed, some of which are supposed to produce near
optimal solutions in reasonable CPU time. Recently, another
approach based on comparative modeling has been proposed.
Several practical programs and web-based server programs are
also developed based on the above-mentioned algorithms. The
purpose of this review paper is to introduce the basic ideas
in important algorithms for RNA secondary structure prediction
with pseudoknots. This paper also tries to reveal relations
among important algorithms.
[Back to top]
Computational Analyses of Ancient Polyploidy
Kevin P. Byrne and Guillaume Blanc
Whole genome duplication has played a major role in the
evolution of many eukaryotic lineages. Polyploidy has long
been postulated as a powerful mechanism for evolutionary innovation,
and recent analyses have provided convincing evidence that
independent ancient genome duplications occurred in the ancestors
of yeast, plants, vertebrates and fish. It is the growing
availability of whole genome sequences that has facilitated
the detection and analysis of these polyploidizations. However,
because polyploidy is often followed by massive gene loss
and chromosomal rearrangements, identifying such events is
not always easy. Here is presented a review of a wide array
of computational methods of ever-increasing sophistication
developed to identify the obscured traces of ancient polyploidy
events in genomic sequences. These methods use a variety of
analytical approaches, including comparative genomics, phylogenetics
and molecular clock analyses. We have also reviewed recent
research on the long-term evolution of genes and genomes duplicated
by polyploidization. This has emerged as a fruitful field,
utilizing genome-wide functional information and genomic sequence
data to further our understanding of the impact of polyploidy
on organismal biology and evolution.
[Back to top]
Principles and Practices of Pathway Modelling
Karthik Raman, Preethi Rajagopalan and Nagasuma Chandra
The potential of systems-based approaches are increasingly
being realised in drug discovery, metabolic engineering and
related areas. Developments in high-throughput experimental
techniques and explosion of genomic data have fuelled progress
in this area. Modelling and simulation of metabolic and regulatory
pathways is an important step in systems analysis. In this
review, we discuss the principles of pathway modelling, simulation
techniques and current practices. A pre-requisite for modelling
and simulating metabolic pathways is an accurate description
of the pathway landscape. Despite availability of hundreds
of annotated genome sequences, accurate information about
pathways is still largely incomplete. We highlight some of
the methods for deriving pathway landscapes from biochemical
literature and high-throughput experimental data. The conceptual
framework for modelling in terms of abstraction levels and
schema for representation is also presented.
Next, several classes of techniques available for modelling
and simulating such systems formulated from pathway landscapes,
viz. kinetic pathway modelling, interaction-based modelling
and constraint-based modelling are discussed. The Systems
Biology Markup Language as well as various pathway design
and simulation tools are reviewed. The usefulness of various
concepts and methodologies in areas such as drug discovery
and metabolic engineering are illustrated with examples from
literature, with a note on future perspectives.
[Back to top]
Is There a Real Bayesian Revolution in Pattern Recognition
for Bioinformatics?
Jukka Corander
Recently, Bayesian statistical thinking has been considered
as a revolutionary force within genetics and bioinformatics.
Novel computational algorithms have enabled use of probability
models of unprecedented degree of complexity in many applications.
Pattern recognition within bioinformatics is a multifaceted
field which poses an enormous challenge for the Bayesian approach
to data analysis. Advantages of this framework have been demonstrated
for, e.g., de novo identification of gene regulatory
binding motifs, identification of gene regulatory networks,
and unsupervised classification of molecular marker data.
However, as complexity of data sets in bioinformatics is continuously
increasing, it is likely that the conventional approaches
to Bayesian computation will not yield feasible solutions
in the future. Even currently, many large-scale problems are
analyzed using traditional algorithmic solutions due to the
exhaustive human and computing resources required by the Bayesian
methods. The generic benefits of solid Bayesian modelling
have been clearly demonstrated in the theoretical literature.
Therefore, it would be ideal if the Bayesian modelling and
computational strategies would rapidly evolve, to meet the
demand from the users of extensively increasing amount of
molecular information. Here we discuss potential courses for
such an evolution, which could help to really revolutionize
statistical thinking in pattern recognition within bioinformatics.
[Back to top]
Intervention in Probabilistic Gene Regulatory Networks
Aniruddha Datta, Ranadip Pal and Edward R. Dougherty
In recent years, there has been a considerable amount of interest
in the area of Genomic Signal Processing, which is the engineering
discipline that studies the processing of genomic signals.
Since regulatory decisions within the cell utilize numerous
inputs, analytical tools are necessary to model the multivariate
influences on decision-making produced by complex genetic
networks. Signal processing approaches such as detection,
prediction and classification have been used in the recent
past to construct genetic regulatory networks capable of modeling
genetic behavior. To accommodate the large amount of uncertainty
associated with this kind of modeling, many of the networks
proposed are probabilistic. One of the objectives of network
modeling is to use the network to design different intervention
approaches for affecting the time evolution of the gene activity
profile of the network. More specifically, one is interested
in intervening to help the network avoid undesirable states
such as those associated with a disease. This paper provides
a tutorial survey of the intervention approaches developed
so far in the literature for probabilistic gene networks (probabilistic
Boolean networks) and outlines some of the open challenges
that remain.
[Back to top]
Accomplishments and Challenges in High Performance
Computing for Computational Biology
Zhihua Du, Feng Lin and Bertil Schmidt
We review recent research and development in high performance
computing (HPC) for computational biology and discuss the
great challenges to both biomedical scientists and IT professionals.
During the last decades, research in the fields of molecular
biology and biomedicine has provided the scientific community
with huge amount of data through sequencing, genome-wide annotation
and gene expression profiling projects. The genetic databases
have been growing exponentially and sophisticated computer
algorithms have been developed to cater for needs of data
mining, analysis and simulation. It is clear that development
of HPC technologies has become crucial for deployment of the
software systems to tackle various bioinformatics problems.
The goal of this article is to present the current research
and our critical review on construction of parallel and distributed
computing systems, design of multi-process algorithms, and
development of software systems for biocomputing tasks including
sequence alignment, heuristic database searching, phylogenetic
analysis gene clustering. We also give a brief introduction
to our work in development of highly scalable and reproducible
HPC algorithms and indicate the challenging problems in this
context.
[Back to top]
Mining Protein-Protein Interaction Data
Ryan J. Haasl and Jianwen Fang
The development of high-throughput technologies that expedite
the discovery of interactions between proteins has made it
possible to screen entire genomes and produce large protein-protein
interaction (PPI) datasets. The availability of these datasets
is now enabling researchers to perform PPI data mining activities
of theoretical and practical importance, including prediction
of novel PPIs and protein function, sub-cellular localization
of proteins, and construction of reasonably realistic, proteome-wide
PPI networks. Most newer methods of in silico PPI
prediction hinge upon conserved sequence signatures discovered
through the analysis of a large PPI dataset, although some
methods attempt to improve predictive accuracy through the
incorporation of additional biological information and/or
multiple datasets. Though the protein interaction networks
constructed to date do not provide a truly realistic picture
of biological network mechanisms, they are functional in the
sense that they have enabled researchers to test the reliability
of high-throughput data, predict protein function, and localize
proteins within the cell. All PPI data mining activities are
constrained by the quantity and quality of the PPI data currently
available. Consequently, the reliability of predictions based
on PPI data is expected to increase as PPI databases increase
in size and taxonomic range.
[Back to top]
Computational and Statistical Methods to Explore the
Various Dimensions of Protein Evolution
Mario A. Fares
Predicting genes and gene regions undergoing adaptive
evolution is one of the most important aims of geneticists
and of new emerging areas of investigation. As more genomes
are being sequenced and computational tools to detect selection
are being developed, the number of genes uncovered as being
positively selected is overwhelming. Several statistical methods
have been devised to test if specific amino acid regions have
undergone adaptive mutations at some stage during the protein’s
evolution. Despite the sensitivity of these methods to detect
selective constraints, they are still based on linear sequence
alignments and therefore, examine only one dimension of the
protein’s evolution. Few methods have been designed
to detect intra-molecular co-evolution between amino acid
sites. However, no tests are performed to determine the adaptive
value of these co-evolutionary events. Conclusions independently
derived from both types of methods are ambiguous and seldom
unequivocal, since evolution of protein sequences is most
likely to be multi-factorial. This review discusses and has
briefly exposed the advantages and disadvantages of the many
different methods and computational tools to detect adaptive
evolution and co-evolution. Further, the potential that the
combination of such methods has in providing more biologically
meaningful results is highlighted.
[Back to top]
Networks Everywhere? Some General Implications of
an Emergent Metaphor
Maria Concetta Palumbo, Lorenzo Farina, Alfredo Colosimo,
Kyaw Tun, Pawan K. Dhar and Alessandro Giuliani
The use of the term ‘network’ is more and
more widespread in all fields of biology. It evokes a systemic
approach to biological problems able to overcome the evident
limitations of the strict reductionism of the past twenty
years. The expectations produced by taking into considerations
not only the single elements but even the intermingled ‘web’
of links connecting different parts of biological entities,
are huge. Nevertheless, we believe the lack of consciousness
that networks, beside their biological ‘likelihood’,
are modeling tools and not real entities, could be detrimental
to the exploitation of the full potential of this paradigm.
In this mini review the basic concepts of network analysis
are presented, together with the relationships linking network
approach to other more established modeling tools as multivariate
data analysis and differential equations. Some applications
of network based modeling of different biological phenomena
are reported as well and the specific advantages of adopting
such strategies are stressed together with the inescapable
limitations.
[Back to top]
Towards a Phenotypic Semantic Web
Georgios V. Gkoutos
The impact of the internet in Biology is undeniable.
The next stage in the evolution of the Internet for biological
and molecular resource discovery must be towards what has
been described as a semantic Web, where not only humans but
machines can make "biologically intelligent" decisions
based on collections of authenticable assertions about biology
and molecular sciences. This vision requires agreed common
representations of data and metadata shared and processed
by automated tools as well as by people. Ontologies have become
an integral part of achieving this and transformed biological
resource management.
In this review, we describe the necessary transition steps
from the initial conception of the internet to the realisation
of the semantic web using as an example its application in
phenotypic information construction and delivery. We review
the different parts of the Semantic web, such as XML, metadata,
RDF, OWL, digital signatures, ontologies and grids whilst
concentrating on how ontology is applied in Biology and more
specifically in phenotype annotation. Finally, we discuss
how the semantic web will transform biological information
management, retrieval and visualisation whilst ensuring the
availability of high quality data of the correct type and
format for the determination of model structures and biological
systems.
[Back to top]
Large Scale Protein Sequence Clustering - Not Solved
But Solvable
Antje Krause
Protein sequence clustering is one of the oldest problems
addressed in the field of computational biology. Back in the
60s, when the first protein sequence database was published
as printed version, Margaret Dayhoff defined the basic principles
of this discipline with only a small number of sequences at
hand. With up to a million sequences available in public databases
nowadays and several well known methods for automatic grouping
of proteins into somehow biologically meaningful families,
subfamilies and superfamilies, the problem seems to be satisfactorily
solved. Nevertheless, apart from the problem of handling such
a huge amount of data, several pitfalls have emerged since
Dayhoff’s times: databases fill up as fast as genomes
are sequenced and a great many of these sequences are fragmental
or disappear again when identified as being transcripts of
wrongly predicted genes or hypothetical products of pseudogenes.
This article first reviews the different approaches developed
during the last decades. These insights will then be used
to point out possible challenges waiting in the future.
[Back to top]
Software Analysis of Two-Dimensional Electrophoretic
Gels in Proteomic Experiments
Martin H. Maurer
Two-dimensional gel electrophoresis in combination with mass
spectrometry constitutes the backbone of proteomic analysis.
With the availability of powerful software tools addressing
the specific needs for analyzing two-dimensional gels, several
typical procedures have been elaborated. In the first part
of this review, we will describe and discuss the procedure
of analyzing two-dimensional electrophoretic gels consisting
of (i) digitizing the gels, (ii) detecting and separating
individual spots, (iii) background subtraction, (iv) creating
a reference gel and (v) matching the spots to the reference
gel, (vi) modifying the reference gel, (vii) normalizing the
gel measurements for comparison, and (viii) calibrating for
isoelectric point and molecular weight markers. In the next
step, (ix) a database containing the measurement results is
constructed and (x) data are compared by statistical and bioinformatic
means. We compare the software currently available for performing
these tasks in the light of recent benchmarking and standardization
efforts. We also comment on the statistical means provided
in the programs including t-test statistics, ANOVA, and additional
software for comparing expression patterns in large gel datasets,
including hierarchical clustering algorithms and self-organizing
maps (SOMs).
|