betawrap : Successful prediction of parallel β-helices from primary sequence reveals an association with many microbial pathogens
Open Access
- 18 December 2001
- journal article
- research article
- Published by Proceedings of the National Academy of Sciences in Proceedings of the National Academy of Sciences
- Vol. 98 (26) , 14819-14824
- https://doi.org/10.1073/pnas.251267298
Abstract
The amino acid sequence rules that specify β-sheet structure in proteins remain obscure. A subclass of β-sheet proteins, parallel β-helices, represent a processive folding of the chain into an elongated topologically simpler fold than globular β-sheets. In this paper, we present a computational approach that predicts the right-handed parallel β-helix supersecondary structural motif in primary amino acid sequences by using β-strand interactions learned from non-β-helix structures. A program called BETAWRAP (http://theory.lcs.mit.edu/betawrap) implements this method and recognizes each of the seven known parallel β-helix families, when trained on the known parallel β-helices from outside that family. BETAWRAP identifies 2,448 sequences among 595,890 screened from the National Center for Biotechnology Information (NCBI; http://www.ncbi.nlm.nih.gov/) nonredundant protein database as likely parallel β-helices. It identifies surprisingly many bacterial and fungal protein sequences that play a role in human infectious disease; these include toxins, virulence factors, adhesins, and surface proteins of Chlamydia, Helicobacteria, Bordetella, Leishmania, Borrelia, Rickettsia, Neisseria, and Bacillus anthracis. Also unexpected was the rarity of the parallel β-helix fold and its predicted sequences among higher eukaryotes. The computational method introduced here can be called a three-dimensional dynamic profile method because it generates interstrand pairwise correlations from a processive sequence wrap. Such methods may be applicable to recognizing other beta structures for which strand topology and profiles of residue accessibility are well conserved.Keywords
This publication has 29 references indexed in Scilit:
- Prospects for ab initio protein structural genomicsJournal of Molecular Biology, 2001
- The Protein Data BankNucleic Acids Research, 2000
- β‐Helix core packing within the triple‐stranded oligomerization domain of the P22 tailspikeProtein Science, 2000
- Sequence specificity, statistical potentials, and three‐dimensional structure prediction with self‐correcting distance geometry calculations of β‐sheet formation in proteinsProtein Science, 1999
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- Prevalence of temperature sensitive folding mutations in the parallel beta coil domain of the phage P22 tailspike endorhamnosidaseJournal of Molecular Biology, 1997
- The novel hexapeptide motif found in the acyltransferases LpxA and LpxD of lipid A biosynthesis is conserved in various bacteriaFEBS Letters, 1994
- A new approach to protein fold recognitionNature, 1992
- Selection of representative protein data setsProtein Science, 1992
- Specific recognition in the tertiary structure of β-sheets of proteinsJournal of Molecular Biology, 1980