Local Decoding of Sequences and Alignment-Free Comparison
- 1 October 2006
- journal article
- research article
- Published by Mary Ann Liebert Inc in Journal of Computational Biology
- Vol. 13 (8) , 1465-1476
- https://doi.org/10.1089/cmb.2006.13.1465
Abstract
International audienceSubword composition plays an important role in a lot of analyses of sequences. Here we define and study the "local decoding of order N of sequences," an alternative that avoids some drawbacks of "subwords of length N" approaches while keeping informations about environments of length N in the sequences ("decoding" is taken here in the sense of hidden Markov modeling, i.e., associating some state to all positions of the sequence). We present an algorithm for computing the local decoding of order N of a given set of sequences. Its complexity is linear in the total length of the set (whatever the order N) both in time and memory space. In order to show a use of local decoding, we propose a very basic dissimilarity measure between sequences which can be computed both from local decoding of order N and composition in subwords of length N. The accuracies of these two dissimilarities are evaluated, over several datasets, by computing their linear correlations with a reference alignment-based distance. These accuracies are also compared to the one obtained from another recent alignment-free comparisonKeywords
This publication has 15 references indexed in Scilit:
- The Los Alamos hepatitis C sequence databaseBioinformatics, 2004
- A probabilistic measure for alignment-free sequence comparisonBioinformatics, 2004
- Bilaterian Phylogeny Based on Analyses of a Region of the Sodium–Potassium ATPase β-Subunit GeneJournal of Molecular Evolution, 2004
- Molecular phylogeny of songbirds (Passeriformes) inferred from mitochondrial 16S ribosomal RNA gene sequencesMolecular Phylogenetics and Evolution, 2003
- zt: A Software Tool for Simple and Partial Mantel TestsJournal of Statistical Software, 2002
- PHYLOGENY OF THE CHLOROPHYCEAE WITH SPECIAL REFERENCE TO THE SPHAEROPLEALES: A STUDY OF 18S AND 26S rDNA DATAJournal of Phycology, 2001
- HIV-1 and HIV-2 LTR Nucleotide Sequences: Assessment of the Alignment by N-block Presentation, “Retroviral Signatures” of Overrepeated Oligonucleotides, and a Probable Important Role of Scrambled Stepwise Duplications/Deletions in Molecular EvolutionMolecular Biology and Evolution, 2001
- Caractérisation des N-écritures et application à l'étude des suites de complexité ultimement n + csteTheoretical Computer Science, 1999
- Caenorhabditis elegans Is a NematodeScience, 1998
- Multiple Sources of Character Information and the Phylogeny of Hawaiian DrosophilidsSystematic Biology, 1997