PROKARYOTE PHYLOGENY WITHOUT SEQUENCE ALIGNMENT: FROM AVOIDANCE SIGNATURE TO COMPOSITION DISTANCE
- 1 March 2004
- journal article
- research article
- Published by World Scientific Pub Co Pte Ltd in Journal of Bioinformatics and Computational Biology
- Vol. 02 (01) , 1-19
- https://doi.org/10.1142/s0219720004000442
Abstract
This is a review of a new and essentially simple method of inferring phylogenetic relationships from complete genome data without using sequence alignment. The method is based on counting the appearance frequency of oligopeptides of a fixed length (up to K=6) in the collection of protein sequences of a species. It is a method without fine adjustment and choice of genes. Applied to prokaryotic genomes it has led to results comparable with the bacteriologists' systematics as reflected in the latest 2002 outline of the Bergey's Manual of Systematic Bacteriology. The method has also been used to compare chloroplast genomes and to the phylogeny of Coronaviruses including human SARS-CoV. A key point in our approach is subtraction of a random background from the original counts by using a Markov model of order K-2 in order to highlight the shaping role of natural selection. The implications of the subtraction procedure is specially analyzed and further development of the new approach is indicated.Keywords
This publication has 29 references indexed in Scilit:
- Whole Proteome Prokaryote Phylogeny Without Sequence Alignment: A K -String Composition ApproachJournal of Molecular Evolution, 2004
- Origin and Phylogeny of Chloroplasts Revealed by a Simple Correlation Analysis of Complete GenomesMolecular Biology and Evolution, 2003
- Molecular phylogeny of coronaviruses including human SARS-CoVChinese Science Bulletin, 2003
- Statistically significant strings are related to regulatory elements in the promoter regions of Saccharomyces cerevisiaePhysica A: Statistical Mechanics and its Applications, 2001
- Information Content of Protein SequencesJournal of Theoretical Biology, 2000
- Fractals from genomes – exact solutions of a biology-inspired problemPhysica A: Statistical Mechanics and its Applications, 2000
- Aminoacyl-tRNA Synthetases, the Genetic Code, and the Evolutionary ProcessMicrobiology and Molecular Biology Reviews, 2000
- Fractals related to long DNA sequences and complete genomesChaos, Solitons, and Fractals, 2000
- Lateral Gene Transfer, Genome Surveys, and the Phylogeny of ProkaryotesScience, 1999
- The universal ancestorProceedings of the National Academy of Sciences, 1998