| Current
Bioinformatics
ISSN: 1574-8936

Current Bioinformatics
Volume 1, Number 3, August 2006
Contents

Computational Models of Transcription Control: A Systems
Theoretic Perspective Pp. 263-272
Peter C.Y. Chen
[Abstract]
Integration of Biological Data with Semantic Networks
Pp. 273-290
Michael Hsing and Artem Cherkasov
[Abstract]
The Role of the COG Database in Comparative and
Functional Genomics Pp. 291-300
Michael Kaufmann
[Abstract]
Inferring Transcriptional Networks by Mining ‘Omics’
Data Pp. 301-313
Tim Van den Bulcke, Karen Lemmens, Yves Van de Peer and
Kathleen Marchal
[Abstract]
Particle-Based Stochastic Simulation in Systems
Biology Pp. 315-320
Dominic P. Tolle and Nicolas Le Novère
[Abstract]
Advances in the Discovery of cis-Regulatory
Elements Pp. 321-336
Youlian Pan
[Abstract]
Statistical Analysis of TATA Box and Its Extensions
in the Promoters of Human Genes Pp. 337-345
Wei Shi, Wanlei Zhou and Yi-Ping Phoebe Chen
[Abstract]
Phenotype Data: A Neglected Resource in Biomedical
Research? Pp. 347-358
Philip Groth and Bertram Weiss
[Abstract]
Genetic Dissection of Complex Traits In Silico:
Approaches, Problems and Solutions Pp. 359-369
Jing Hua Zhao and Qihua Tan
[Abstract]
Abstracts
[Back to top]
Computational Models of Transcription Control: A Systems
Theoretic Perspective
Peter C.Y. Chen
In a gene network, genes may be expressed constantly, or expressed
based on molecular signals. Transcription is a key process
in gene expression. Through evolution, biological organisms
have developed internal regulatory mechanisms for transcription
control. Such mechanisms dictate how the network will function
under certain environmental conditions and respond to changes
in the environment. To develop formal approaches that enable
the design and synthesis of such logical controls in artificial
gene networks represents a major challenge. A first step in
meeting this challenge would be to build analytical models
of transcription control. This paper reviews computational
approaches for modeling transcription control in gene networks
from a systems-theoretic perspective, with emphasis on the
logical representational capability of the models and their
potential use in synthesis of external control.
[Back to top]
Integration of Biological Data with Semantic Networks
Michael Hsing and Artem Cherkasov
In recent years, the broad utilization of high-throughput
experimental techniques resulted in a vast amount of expression
and interaction data, accompanied by information on metabolic,
cell signaling and gene regulatory pathways accumulated in
the literature and databases. Thus, one of the major goals
of modern bioinformatics is to process and integrate heterogeneous
biological data to provide an insight into the inner workings
of a cell governed by complex interaction networks.
The paper reviews the current development of semantic network
(SN) technologies and their applications to the integration
of genomic and proteomic data. We also elaborate on our own
work that applies a semantic network approach to modeling
complex cell signaling pathways and simulating the cause-effect
of molecular interactions in human macrophages.
The review is concluded with a discussion of the prospective
use of semantic networks in bioinformatics practice as an
efficient and general language for data integration, knowledge
representation and inference.
[Back to top]
The Role of the COG Database in Comparative and
Functional Genomics
Michael Kaufmann
A major breakthrough in classifying proteins from different
microbial genomes in terms of sequence similarity was the
development of the COG concept by Tatusov et al.
in 1997. The authors defined clusters of orthologous groups
of proteins (COGs) by strictly applying all against all BLAST
alignments of protein sequences from completely sequenced
microbial genomes. The latest update of the COG database already
covered 66 microbial genomes and additionally included the
KOG database, an equivalent consisting of seven eukaryotic
genomes. Although excellent web-based software tools designed
to analyze this huge amount of data were initially provided
by the authors, many other groups independently developed
more specialized or extended programs making use of COG data
for diverse purposes. Here a brief introduction is given to
the concept behind COGs and their potentials in the field
of comparative and functional genomics are discussed. The
review then is focused on the multitude of recently developed
web services aimed at mining the COG database. Their capabilities
to solve diverse problems in biochemistry are addressed. In
order to illustrate the broad field of possible applications,
a compilation of recently published findings, implementing
information derived from comparative genomics with emphasis
on data retrieved from the COG database, is given.
[Back to top]
Inferring Transcriptional Networks by Mining ‘Omics’
Data
Tim Van den Bulcke, Karen Lemmens, Yves Van de Peer and
Kathleen Marchal
Inferring comprehensive regulatory networks from high-throughput
data is one of the foremost challenges of modern computational
biology. As high-throughput expression profiling experiments
have gained common ground in many laboratories, different
techniques have been proposed to infer transcriptional regulatory
networks from them. Furthermore, with the advent of diverse
types of high-throughput data, the research in network inference
has received a new impulse. The use of diverse types of data,
together with the increasing tendency of building the inference
on biologically plausible simplifications, allows a more reliable
and more complete description of networks. Here, we discuss
how the research focus in the field of network inference is
increasingly shifting from methods trying to reconstruct networks
from a single data type towards integrative approaches dealing
with several data sources simultaneously to infer regulatory
modules.
[Back to top]
Particle-Based Stochastic Simulation in Systems
Biology
Dominic P. Tolle and Nicolas Le Novère
Computational modeling and simulation have become invaluable
tools for the biological sciences. Both aid in the formulation
of new hypothesis and supplement traditional experimental
research. Many different types of models using various mathematical
formalisms can be created to represent any given biological
system. Here we review a class of modeling techniques based
on particle-based stochastic approaches. In these models,
every reacting molecule is represented individually. Reactions
between molecules occur in a probabilistic manner. Modeling
problems caused by spatial heterogeneity and combinatorial
complexity, features common to biochemical and cellular systems,
are best addressed using Monte-Carlo single-particle methods.
Several software tools implementing single-particle based
modeling techniques are introduced and their various advantages
and pitfalls discussed.
[Back to top]
Advances in the Discovery of cis-Regulatory
Elements
Youlian Pan
Discovery of transcription regulatory elements has been
an enormous challenge, both to biologists and computational
scientists. Over the last three decades, significant progress
has been achieved by various laboratories around the world.
Earlier, laborious experimental methods were used to detect
one or handful of elements at a time. With recent advances
in DNA sequencing technology, many completed genomes became
available. High throughput biological techniques and computational
methods emerged. Comparative genomic approaches and their
integration with microarray gene expression data provided
promising results. In this review, we discuss the development
of technology to decipher the complex transcription regulation
system with a focus on the discovery of cis-regulatory
elements in eukaryotes.
[Back to top]
Statistical Analysis of TATA Box and Its Extensions
in the Promoters of Human Genes
Wei Shi, Wanlei Zhou and Yi-Ping Phoebe Chen
We have conducted a dedicated analysis on the frequency
distribution of the TATA Box and TATA extension sequences
on six data sets of human promoters. Promoters in these sets
have different lengths and are from different types of genes
(housekeeping genes, tissue specific genes, and all genes).
The statistical approach developed in this study will firstly
partition the promoters into bins of 20 bp long, then calculate
the frequency distribution of TATA elements and TATA extension
sequences. The median value is used to capture outstanding
TATA elements or TATA extension sequences when calculating
their statistical significance. This study discovered that
two of the 16 TATA Box elements (TATAAAAG and TATATAAG) showed
the sharpest peaks at the location of 10~30 bp upstream from
transcription start sites where TATA Box is believed to reside.
Fourteen TATA Box extensions showed the sharpest peaks at
this location as well among all TATA extension sequences.
Two of these fourteen TATA extension sequences have been verified
to be the transcription factor binding sites by other research
efforts. We suggest that the remaining twelve TATA extension
sequences are the new putative TATA binding sites. This study
also found that there was very little difference between the
frequency distribution of TATA elements on housekeeping genes
and their frequency distribution on tissue specific genes.
[Back to top]
Phenotype Data: A Neglected Resource in Biomedical
Research?
Philip Groth and Bertram Weiss
To a great extent, our phenotype is determined by our
genetic material. Many genotypic modifications may ultimately
become manifest in more or less pronounced changes in phenotype.
Despite the importance of how specific genetic alterations
contribute to the development of diseases, surprisingly little
effort has been made towards exploiting systematically the
current knowledge of genotype-phenotype relationships. In
the past, genes were characterized with the help of so-called
"forward genetics" studies in model organisms, relating
a given phenotype to a genetic modification. Analogous studies
in higher organisms were hampered by the lack of suitable
high-throughput genetic methods. This situation has now changed
with the advent of new screening methods, especially RNA interference
(RNAi) which allows to specifically silence gene by gene and
to observe the phenotypic outcome. This ongoing large-scale
characterization of genes in mammalian in-vitro model
systems will increase phenotypic information exponentially
in the very near future. But will our knowledge grow equally
fast? As in other scientific areas, data integration is a
key problem. It is thus still a major bioinformatics challenge
to interpret the results of large-scale functional screens,
even more so if sets of heterogeneous data are to be combined.
It is now time to develop strategies to structure and use
these data in order to transform the wealth of information
into knowledge and, eventually, into novel therapeutic approaches.
In light of these developments, we thoroughly surveyed the
available phenotype resources and reviewed different approaches
to analyzing their content. We discuss hurdles yet to be overcome,
i.e. the lack of data integration, the missing adequate phenotype
ontologies and the shortage of appropriate analytical tools.
This review aims to assist researchers keen to understand
and make effective use of these highly valuable data.
[Back to top]
Genetic Dissection of Complex Traits In Silico:
Approaches, Problems and Solutions
Jing Hua Zhao and Qihua Tan
The genome projects in human and other species have made
genetic data widely available and pose challenges as well
as opportunities for statistical analysis. In this paper we
elaborate the concept of integrated analysis of genetic data,
such that most aspects of analyses can be done effectively
and efficiently in environments with facility for database
accessibility, graphics, mathematical/statistical routines,
flexible programming language, re-use of available codes,
Internet connectivity and availability. This extends an earlier
discussion on software consolidation (Guo and Lange. Theor
Pop Biol 57:1-11, 2000). A general context is laid out by
recollecting the research paradigms for genetic mapping of
complex traits and illustrated with the study of ageing, before
turning to the computational tools currently used. We show
that the R system (http://www.r-project.org) so far is the
most comprehensive and widely available system. However, other
commercial systems can potentially be successful. In particular,
we compare SAS (http://www.sas.com), Stata (http://www.stata.com),
S-PLUS (http://www.insightful.com) and give some indications
of future development. Our investigation has important implications
for both statisticians end other researchers actively engaged
in analysis of genetic data.
|