Quality of alignment comparison by COMPASS improves with inclusion of diverse confident homologs
Open Access
- 29 January 2004
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 20 (6) , 818-828
- https://doi.org/10.1093/bioinformatics/btg485
Abstract
Motivation: Adding more distant homologs to a multiple alignment and thus increasing its diversity may eventually deteriorate the numerical profile constructed from this alignment. Here, we addressed the question whether such a diversity limit can be reached in the alignments of confident homologs found by PSI-BLAST, and we analyzed the dependence of the quality of the profile–profile comparison made by COMPASS on the sequence diversity within these alignments. Results: Protein families that have a greater number of diverse confident homologs in the current sequence databases provide an increased quality of similarity detection in profile databases, but produce on average less accurate profile–profile alignments with their remote relatives. This lower alignment accuracy cannot be improved when the most distant members of these families are excluded from their profiles. On the contrary, the presence of more diverse members results in more accurate alignments. For families with a high diversity of confident homologs, the lower quality of profile alignments with their remote relatives seems to be an attribute of these families or their alignments, rather than to be caused by the large number of diverse sequences itself. Our results suggest that at any level of profile diversity, one should include in the multiple alignment as many confident sequence homologs as possible in order to produce the most accurate results.Keywords
This publication has 8 references indexed in Scilit:
- Profile–profile comparisons by COMPASS predict intricate homologies between protein familiesProtein Science, 2003
- Probabilistic scoring measures for profile–profile comparison yield more accurate short seed alignmentsBioinformatics, 2003
- COMPASS: A Tool for Comparison of Multiple Protein Alignments with Assessment of Statistical SignificanceJournal of Molecular Biology, 2003
- Finding weak similarities between proteins by sequence profile comparisonNucleic Acids Research, 2003
- Within the twilight zone: a sensitive profile-profile comparison tool based on information theoryJournal of Molecular Biology, 2002
- Recent improvements to the SMART domain-based sequence annotation resourceNucleic Acids Research, 2002
- Large-scale comparison of protein sequence alignment algorithms with structure alignmentsProteins-Structure Function and Bioinformatics, 2000
- Increased coverage of protein families with the Blocks Database serversNucleic Acids Research, 2000