Genomic signal processing
Top Cited Papers
- 1 July 2001
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Signal Processing Magazine
- Vol. 18 (4) , 8-20
- https://doi.org/10.1109/79.939833
Abstract
Genomics is a highly cross-disciplinary field that creates paradigm shifts in such diverse areas as medicine and agriculture. It is believed that many significant scientific and technological endeavors in the 21st century will be related to the processing and interpretation of the vast information that is currently revealed from sequencing the genomes of many living organisms, including humans. Genomic information is digital in a very real sense; it is represented in the form of sequences of which each element can be one out of a finite number of entities. Such sequences, like DNA and proteins, have been mathematically represented by character strings, in which each character is a letter of an alphabet. In the case of DNA, the alphabet is size 4 and consists of the letters A, T, C and G; in the case of proteins, the size of the corresponding alphabet is 20. As the list of references shows, biomolecular sequence analysis has already been a major research topic among computer scientists, physicists, and mathematicians. The main reason that the field of signal processing does not yet have significant impact in the field is because it deals with numerical sequences rather than character strings. However, if we properly map a character string into, one or more numerical sequences, then digital signal processing (DSP) provides a set of novel and useful tools for solving highly relevant problems. For example, in the form of local texture, color spectrograms visually provide significant information about biomolecular sequences which facilitates understanding of local nature, structure, and function. Furthermore, both the magnitude and the phase of properly defined Fourier transforms can be used to predict important features like the location and certain properties of protein coding regions in DNA. Even the process of mapping DNA into proteins and the interdependence of the two kinds of sequences can be analyzed using simulations based on digital filtering. These and other DSP-based approaches result in alternative mathematical formulations and may provide improved computational techniques for the solution of useful problems in genomic information science and technology.Keywords
This publication has 24 references indexed in Scilit:
- Making chips to probe genesIEEE Spectrum, 2001
- Understanding the human genomeIEEE Spectrum, 2000
- Statistical learning formulation of the DNA base-calling problem and its solution in a Bayesian EM frameworkDiscrete Applied Mathematics, 2000
- Genome Annotation Assessment in Drosophila melanogasterGenome Research, 2000
- Periodical distribution of transcription factor sites in promoter regions and connection with chromatin structureProceedings of the National Academy of Sciences, 1999
- 10-11 bp periodicities in complete genomes reflect protein structure and DNA folding.Bioinformatics, 1999
- Nucleosome DNA Sequence Pattern Revealed by Multiple Alignment of Experimentally Mapped SequencesJournal of Molecular Biology, 1996
- Measuring correlations in symbol sequencesPhysica A: Statistical Mechanics and its Applications, 1995
- Understanding long-range correlations in DNA sequencesPhysica D: Nonlinear Phenomena, 1994
- Assessment of protein coding measuresNucleic Acids Research, 1992