The Closest BLAST Hit Is Often Not the Nearest Neighbor
Top Cited Papers
- 1 June 2001
- journal article
- Published by Springer Nature in Journal of Molecular Evolution
- Vol. 52 (6) , 540-542
- https://doi.org/10.1007/s002390010184
Abstract
It is well known that basing phylogenetic reconstructions on uncorrected genetic distances can lead to errors in their reconstruction. Nevertheless, it is often common practice to report simply the most similar BLAST (Altschul et al. 1997) hit in genomic reports that discuss many genes (Ruepp et al. 2000; Freiberg et al. 1997). This is because BLAST hits can provide a rapid, efficient, and concise analysis of many genes at once. These hits are often interpreted to imply that the gene is most closely related to the gene or protein in the databases that returned the closest BLAST hit. Though these two may coincide, for many genes, particularly genes with few homologs, they may not be the same. There are a number of circumstances that can account for such limitations in accuracy (Eisen 2000). We stress here that genes appearing to be the most similar based on BLAST hits are often not each others closest relative phylogenetically. The extent to which this occurs depends on the availability of close relatives present in the databases. As an example we have chosen the analysis of the genomes of a crenarcheaota species Aeropyrum pernix, an organism with few close relatives fully sequenced, and Escherichia coli, an organism whose closest relative, Salmonella typhimurium, is completely sequenced.Keywords
This publication has 15 references indexed in Scilit:
- A phylogenomic approach to microbial evolutionNucleic Acids Research, 2001
- Horizontal gene transfer among microbial genomes: new insights from complete genome analysisCurrent Opinion in Genetics & Development, 2000
- The genome sequence of the thermoacidophilic scavenger Thermoplasma acidophilumNature, 2000
- A phylogenomic study of DNA repair genes, proteins, and processesMutation Research/DNA Repair, 1999
- Accounting for Evolutionary Rate Variation among Sequence Sites Consistently Changes Universal Phylogenies Deduced from rRNA and Protein-Coding GenesMolecular Phylogenetics and Evolution, 1999
- Evidence for lateral gene transfer between Archaea and Bacteria from genome sequence of Thermotoga maritimaNature, 1999
- Molecular archaeology of the Escherichia coli genomeProceedings of the National Academy of Sciences, 1998
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- Molecular basis of symbiosis between Rhizobium and legumesNature, 1997
- Quartet Puzzling: A Quartet Maximum-Likelihood Method for Reconstructing Tree TopologiesMolecular Biology and Evolution, 1996