A unifold, mesofold, and superfold model of protein fold use

16 November 2001

journal article
research article
Published by Wiley in Proteins-Structure Function and Bioinformatics

Vol. 46 (1) , 61-71
https://doi.org/10.1002/prot.10011

Abstract

As more and more protein structures are determined, there is increasing interest in the question of how many different folds have been used in biology. The history of the rate of discovery of new folds and the distribution of sequence families among known folds provide a means of estimating the underlying distribution of fold use. Previous models exploiting these data have led to rather different conclusions on the total number of folds. We present a new model, based on the notion that the folds used in biology fall naturally into three classes: unifolds, that is, folds found only in a single narrow sequence family; mesofolds, found in an intermediate number of families; and the previously noted superfolds, found in many protein families. We show that this model fits the available data well and has predicted the development of SCOP over the past 2 years. The principle implications of the model are as follows: (1) The vast majority of folds will be found in only a single sequence family; (2) the total number of folds is at least 10,000; and (3) 80% of sequence families have one of about 400 folds, most of which are already known. Proteins 2002;46:61–71.

Keywords

This publication has 12 references indexed in Scilit:

SCOP: A structural classification of proteins database for the investigation of sequences and structures
Published by Elsevier ,2006
Completeness in structural genomics
Nature Structural & Molecular Biology, 2001
Estimating the number of protein folds and families from complete genome data 1 1Edited by J. Thornton
Journal of Molecular Biology, 2000
The Pfam Protein Families Database
Nucleic Acids Research, 2000
Estimating the number of protein folds
Journal of Molecular Biology, 1998
Why are some proteins structures so common?
Proceedings of the National Academy of Sciences, 1996
Maximum Discrimination Hidden Markov Models of Sequence Consensus
Journal of Computational Biology, 1995
Protein superfamilles and domain superfolds
Nature, 1994
One thousand families for the molecular biologist
Nature, 1992
The appearance of new structures and functions in proteins during evolution
Journal of Molecular Evolution, 1975