Target selection and deselection at the Berkeley Structural Genomics Center
- 7 November 2005
- journal article
- research article
- Published by Wiley in Proteins-Structure Function and Bioinformatics
- Vol. 62 (2) , 356-370
- https://doi.org/10.1002/prot.20674
Abstract
At the Berkeley Structural Genomics Center (BSGC), our goal is to obtain a near‐complete structural complement of proteins in the minimal organisms Mycoplasma genitalium and M. pneumoniae, two closely related pathogens. Current targets for structure determination have been selected in six major stages, starting with those predicted to be most tractable to high throughput study and likely to yield new structural information. We report on the process used to select these proteins, as well as our target deselection procedure. Target deselection reduces experimental effort by eliminating targets similar to those recently solved by the structural biology community or other centers. We measure the impact of the 69 structures solved at the BSGC as of July 2004 on structure prediction coverage of the M. pneumoniae and M. genitalium proteomes. The number of Mycoplasma proteins for which the fold could first be reliably assigned based on structures solved at the BSGC (24 M. pneumoniae and 21 M. genitalium) is approximately 25% of the total resulting from work at all structural genomics centers and the worldwide structural biology community (94 M. pneumoniae and 86 M. genitalium) during the same period. As the number of structures contributed by the BSGC during that period is less than 1% of the total worldwide output, the benefits of a focused target selection strategy are apparent. If the structures of all current targets were solved, the percentage of M. pneumoniae proteins for which folds could be reliably assigned would increase from approximately 57% (391 of 687) at present to around 80% (550 of 687), and the percentage of the proteome that could be accurately modeled would increase from around 37% (254 of 687) to about 64% (438 of 687). In M. genitalium, the percentage of the proteome that could be structurally annotated based on structures of our remaining targets would rise from 72% (348 of 486) to around 76% (371 of 486), with the percentage of accurately modeled proteins would rise from 50% (243 of 486) to 58% (283 of 486). Sequences and data on experimental progress on our targets are available in the public databases TargetDB and PEPCdb. Proteins 2006.Keywords
This publication has 59 references indexed in Scilit:
- How Many Genes Can Make a Cell: The Minimal-Gene-Set ConceptAnnual Review of Genomics and Human Genetics, 2000
- Proteomics of Mycoplasma genitalium: identification and characterization of unannotated and atypical proteins in a small model genomeNucleic Acids Research, 2000
- Global Transposon Mutagenesis and a Minimal Mycoplasma GenomeScience, 1999
- Errors in genome annotationTrends in Genetics, 1999
- GenTHREADER: an efficient and reliable protein fold recognition method for genomic sequencesJournal of Molecular Biology, 1999
- Complete Sequence Analysis of the Genome of the Bacterium Mycoplasma PneumoniaeNucleic Acids Research, 1996
- Novelties from the complete genome of Mycoplasma genitaliumMolecular Microbiology, 1996
- Sequencing and analysis of bacterial genomesCurrent Biology, 1996
- The Minimal Gene Complement of Mycoplasma genitaliumScience, 1995
- A survey of the Mycoplasma genitalium genome by using random sequencingJournal of Bacteriology, 1993