Whole-proteome phylogeny of large dsDNA virus families by an alignment-free method
- 4 August 2009
- journal article
- research article
- Published by Proceedings of the National Academy of Sciences in Proceedings of the National Academy of Sciences
- Vol. 106 (31) , 12826-12831
- https://doi.org/10.1073/pnas.0905115106
Abstract
The vast sequence divergence among different virus groups has presented a great challenge to alignment-based sequence comparison among different virus families. Using an alignment-free comparison method, we construct the whole-proteome phylogeny for a population of viruses from 11 viral families comprising 142 large dsDNA eukaryote viruses. The method is based on the feature frequency profiles (FFP), where the length of the feature (l-mer) is selected to be optimal for phylogenomic inference. We observe that (i) the FFP phylogeny segregates the population into clades, the membership of each has remarkable agreement with current classification by the International Committee on the Taxonomy of Viruses, with one exception that the mimivirus joins the phycodnavirus family; (ii) the FFP tree detects potential evolutionary relationships among some viral families; (iii) the relative position of the 3 herpesvirus subfamilies in the FFP tree differs from gene alignment-based analysis; (iv) the FFP tree suggests the taxonomic positions of certain “unclassified” viruses; and (v) the FFP method identifies candidates for horizontal gene transfer between virus families.Keywords
This publication has 48 references indexed in Scilit:
- Alignment-free genome comparison with feature frequency profiles (FFP) and optimal resolutionsProceedings of the National Academy of Sciences, 2009
- The order HerpesviralesArchiv für die gesamte Virusforschung, 2008
- NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteinsNucleic Acids Research, 2007
- Interactive Tree Of Life (iTOL): an online tool for phylogenetic tree display and annotationBioinformatics, 2006
- On the classification and nomenclature of baculoviruses: A proposal for revisionArchiv für die gesamte Virusforschung, 2006
- The Average Common Substring Approach to Phylogenomic ReconstructionJournal of Computational Biology, 2006
- Genomic Classification Using an Information-Based Similarity Index: Application to the SARS CoronavirusJournal of Computational Biology, 2005
- A whole genome perspective on the phylogeny of the plant virus family TombusviridaeArchiv für die gesamte Virusforschung, 2004
- Statistics of local complexity in amino acid sequences and sequence databasesPublished by Elsevier ,2001
- Divergence measures based on the Shannon entropyIEEE Transactions on Information Theory, 1991