Towards a comprehensive structural coverage of completed genomes: a structural genomics viewpoint
Open Access
- 9 March 2007
- journal article
- research article
- Published by Springer Nature in BMC Bioinformatics
- Vol. 8 (1) , 1-17
- https://doi.org/10.1186/1471-2105-8-86
Abstract
Structural genomics initiatives were established with the aim of solving protein structures on a large-scale. For many initiatives, such as the Protein Structure Initiative (PSI), the primary aim of target selection is focussed towards structurally characterising protein families which, so far, lack a structural representative. It is therefore of considerable interest to gain insights into the number and distribution of these families, and what efforts may be required to achieve a comprehensive structural coverage across all protein families. In this analysis we have derived a comprehensive domain annotation of the genomes using CATH, Pfam-A and Newfam domain families. We consider what proportions of structurally uncharacterised families are accessible to high-throughput structural genomics pipelines, specifically those targeting families containing multiple prokaryotic orthologues. In measuring the domain coverage of the genomes, we show the benefits of selecting targets from both structurally uncharacterised domain families, whilst in addition, pursuing additional targets from large structurally characterised protein superfamilies. This work suggests that such a combined approach to target selection is essential if structural genomics is to achieve a comprehensive structural coverage of the genomes, leading to greater insights into structure and the mechanisms that underlie protein evolution.Keywords
This publication has 58 references indexed in Scilit:
- Structural Diversity of Domain Superfamilies in the CATH DatabaseJournal of Molecular Biology, 2006
- Protein Family Expansions and Biological ComplexityPLoS Computational Biology, 2006
- The Pfam protein families databaseNucleic Acids Research, 2004
- Analysis of singleton ORFans in fully sequenced microbial genomesProteins-Structure Function and Bioinformatics, 2003
- Target Selection and Determination of Function in Structural GenomicsIUBMB Life, 2003
- An efficient algorithm for large-scale detection of protein familiesNucleic Acids Research, 2002
- GenTHREADER: an efficient and reliable protein fold recognition method for genomic sequencesJournal of Molecular Biology, 1999
- Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methodsJournal of Molecular Biology, 1998
- CATH – a hierarchic classification of protein domain structuresPublished by Elsevier ,1997
- Predicting Coiled Coils from Protein SequencesScience, 1991