Selecting protein targets for structural genomics of Pyrobaculum aerophilum : Validating automated fold assignment methods by using binary hypothesis testing
Open Access
- 7 March 2000
- journal article
- Published by Proceedings of the National Academy of Sciences in Proceedings of the National Academy of Sciences
- Vol. 97 (6) , 2450-2455
- https://doi.org/10.1073/pnas.050589297
Abstract
Three-dimensional protein folds were assigned to all ORFs of the recently sequenced genome of the hyperthermophilic archaeon Pyrobaculum aerophilum. Binary hypothesis testing was used to estimate a confidence level for each assignment. A separate test was conducted to assign a probability for whether each sequence has a novel fold—i.e., one that is not yet represented in the experimental database of known structures. Of the 2,130 predicted nontransmembrane proteins in this organism, 916 matched a fold at a cumulative 90% confidence level, and 245 could be assigned at a 99% confidence level. Likewise, 286 proteins were predicted to have a previously unobserved fold with a 90% confidence level, and 14 at a 99% confidence level. These statistically based tools are combined with homology searches against the Online Mendelian Inheritance in Man (OMIM) human genetics database and other protein databases for the selection of attractive targets for crystallographic or NMR structure determination. Results of these studies have been collated and placed at http://www.doe-mbi.ucla.edu/people/parag/PA_HOME/, the University of California, Los Angeles–Department of Energy Pyrobaculum aerophilum web site.Keywords
This publication has 31 references indexed in Scilit:
- SCOP: A structural classification of proteins database for the investigation of sequences and structuresPublished by Elsevier ,2006
- Identification of common molecular subsequencesPublished by Elsevier ,2004
- Homology-based fold predictions for Mycoplasma genitalium proteinsJournal of Molecular Biology, 1998
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- A fosmid-based genomic map and identification of 474 genes of the hyperthermophilic archaeon Pyrobaculum aerophilumExtremophiles, 1997
- Enlarged representative set of protein structuresProtein Science, 1994
- Identification of protein coding regions by database similarity searchNature Genetics, 1993
- Selection of representative protein data setsProtein Science, 1992
- Basic Local Alignment Search ToolJournal of Molecular Biology, 1990
- Basic local alignment search toolJournal of Molecular Biology, 1990