Protein homology detection and fold inference through multiple alignment entropy profiles

1 August 2007

journal article
research article
Published by Wiley in Proteins-Structure Function and Bioinformatics

Vol. 70 (1) , 248-256
https://doi.org/10.1002/prot.21506

Abstract

Homology detection and protein structure prediction are central themes in bioinformatics. Establishment of relationship between protein sequences or prediction of their structure by sequence comparison methods finds limitations when there is low sequence similarity. Recent works demonstrate that the use of profiles improves homology detection and protein structure prediction. Profiles can be inferred from protein multiple alignments using different approaches. The “Conservatism‐of‐Conservatism” is an effective profile analysis method to identify structural features between proteins having the same fold but no detectable sequence similarity. The information obtained from protein multiple alignments varies according to the amino acid classification employed to calculate the profile. In this work, we calculated entropy profiles from PSI‐BLAST‐derived multiple alignments and used different amino acid classifications summarizing almost 500 different attributes. These entropy profiles were converted into pseudocodes which were compared using the FASTA program with an ad‐hoc matrix. We tested the performance of our method to identify relationships between proteins with similar fold using a nonredundant subset of sequences having less than 40% of identity. We then compared our results using Coverage Versus Error per query curves, to those obtained by methods like PSI‐BLAST, COMPASS and HHSEARCH. Our method, named HIP (Homology Identification with Profiles) presented higher accuracy detecting relationships between proteins with the same fold. The use of different amino acid classifications reflecting a large number of amino acid attributes, improved the recognition of distantly related folds. We propose the use of pseudocodes representing profile information as a fast and powerful tool for homology detection, fold assignment and analysis of evolutionary information enclosed in protein profiles. Proteins 2008.

Keywords

This publication has 22 references indexed in Scilit:

Solving the protein sequence metric problem
Proceedings of the National Academy of Sciences, 2005
Protein homology detection by HMM–HMM comparison
Bioinformatics, 2004
Profile–profile methods provide improved fold‐recognition: A study of different profile–profile alignment methods
Proteins-Structure Function and Bioinformatics, 2004
Clustering of highly homologous sequences to reduce the size of large protein databases
Bioinformatics, 2001
Assigning genomic sequences to CATH
Nucleic Acids Research, 2000
Universally conserved positions in protein folds: reading evolutionary signals about stability, folding kinetics and function
Journal of Molecular Biology, 1999
How evolution makes proteins fold quickly
Proceedings of the National Academy of Sciences, 1998
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
Nucleic Acids Research, 1997
CATH – a hierarchic classification of protein domain structures
Published by Elsevier ,1997
A structural explanation for the twilight zone of protein sequence homology
Structure, 1996