A unifold, mesofold, and superfold model of protein fold use
- 16 November 2001
- journal article
- research article
- Published by Wiley in Proteins-Structure Function and Bioinformatics
- Vol. 46 (1) , 61-71
- https://doi.org/10.1002/prot.10011
Abstract
As more and more protein structures are determined, there is increasing interest in the question of how many different folds have been used in biology. The history of the rate of discovery of new folds and the distribution of sequence families among known folds provide a means of estimating the underlying distribution of fold use. Previous models exploiting these data have led to rather different conclusions on the total number of folds. We present a new model, based on the notion that the folds used in biology fall naturally into three classes: unifolds, that is, folds found only in a single narrow sequence family; mesofolds, found in an intermediate number of families; and the previously noted superfolds, found in many protein families. We show that this model fits the available data well and has predicted the development of SCOP over the past 2 years. The principle implications of the model are as follows: (1) The vast majority of folds will be found in only a single sequence family; (2) the total number of folds is at least 10,000; and (3) 80% of sequence families have one of about 400 folds, most of which are already known. Proteins 2002;46:61–71.Keywords
This publication has 12 references indexed in Scilit:
- SCOP: A structural classification of proteins database for the investigation of sequences and structuresPublished by Elsevier ,2006
- Completeness in structural genomicsNature Structural & Molecular Biology, 2001
- Estimating the number of protein folds and families from complete genome data 1 1Edited by J. ThorntonJournal of Molecular Biology, 2000
- The Pfam Protein Families DatabaseNucleic Acids Research, 2000
- Estimating the number of protein foldsJournal of Molecular Biology, 1998
- Why are some proteins structures so common?Proceedings of the National Academy of Sciences, 1996
- Maximum Discrimination Hidden Markov Models of Sequence ConsensusJournal of Computational Biology, 1995
- Protein superfamilles and domain superfoldsNature, 1994
- One thousand families for the molecular biologistNature, 1992
- The appearance of new structures and functions in proteins during evolutionJournal of Molecular Evolution, 1975