Protein homology detection and fold inference through multiple alignment entropy profiles
- 1 August 2007
- journal article
- research article
- Published by Wiley in Proteins-Structure Function and Bioinformatics
- Vol. 70 (1) , 248-256
- https://doi.org/10.1002/prot.21506
Abstract
Homology detection and protein structure prediction are central themes in bioinformatics. Establishment of relationship between protein sequences or prediction of their structure by sequence comparison methods finds limitations when there is low sequence similarity. Recent works demonstrate that the use of profiles improves homology detection and protein structure prediction. Profiles can be inferred from protein multiple alignments using different approaches. The “Conservatism‐of‐Conservatism” is an effective profile analysis method to identify structural features between proteins having the same fold but no detectable sequence similarity. The information obtained from protein multiple alignments varies according to the amino acid classification employed to calculate the profile. In this work, we calculated entropy profiles from PSI‐BLAST‐derived multiple alignments and used different amino acid classifications summarizing almost 500 different attributes. These entropy profiles were converted into pseudocodes which were compared using the FASTA program with an ad‐hoc matrix. We tested the performance of our method to identify relationships between proteins with similar fold using a nonredundant subset of sequences having less than 40% of identity. We then compared our results using Coverage Versus Error per query curves, to those obtained by methods like PSI‐BLAST, COMPASS and HHSEARCH. Our method, named HIP (Homology Identification with Profiles) presented higher accuracy detecting relationships between proteins with the same fold. The use of different amino acid classifications reflecting a large number of amino acid attributes, improved the recognition of distantly related folds. We propose the use of pseudocodes representing profile information as a fast and powerful tool for homology detection, fold assignment and analysis of evolutionary information enclosed in protein profiles. Proteins 2008.Keywords
This publication has 22 references indexed in Scilit:
- Solving the protein sequence metric problemProceedings of the National Academy of Sciences, 2005
- Protein homology detection by HMM–HMM comparisonBioinformatics, 2004
- Profile–profile methods provide improved fold‐recognition: A study of different profile–profile alignment methodsProteins-Structure Function and Bioinformatics, 2004
- Clustering of highly homologous sequences to reduce the size of large protein databasesBioinformatics, 2001
- Assigning genomic sequences to CATHNucleic Acids Research, 2000
- Universally conserved positions in protein folds: reading evolutionary signals about stability, folding kinetics and functionJournal of Molecular Biology, 1999
- How evolution makes proteins fold quicklyProceedings of the National Academy of Sciences, 1998
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- CATH – a hierarchic classification of protein domain structuresPublished by Elsevier ,1997
- A structural explanation for the twilight zone of protein sequence homologyStructure, 1996