Comprehensive de novo structure prediction in a systems-biology context for the archaea Halobacterium sp NRC-1
Open Access
- 31 December 2003
- journal article
- research article
- Published by Springer Nature in Genome Biology
- Vol. 5 (8) , R52
- https://doi.org/10.1186/gb-2004-5-8-r52
Abstract
Background: Large fractions of all fully sequenced genomes code for proteins of unknown function. Annotating these proteins of unknown function remains a critical bottleneck for systems biology and is crucial to understanding the biological relevance of genome-wide changes in mRNA and protein expression, protein-protein and protein-DNA interactions. The work reported here demonstrates that de novo structure prediction is now a viable option for providing general function information for many proteins of unknown function. Results: We have used Rosetta de novo structure prediction to predict three-dimensional structures for 1,185 proteins and protein domains (<150 residues in length) found in Halobacterium NRC-1, a widely studied halophilic archaeon. Predicted structures were searched against the Protein Data Bank to identify fold similarities and extrapolate putative functions. They were analyzed in the context of a predicted association network composed of several sources of functional associations such as: predicted protein interactions, predicted operons, phylogenetic profile similarity and domain fusion. To illustrate this approach, we highlight three cases where our combined procedure has provided novel insights into our understanding of chemotaxis, possible prophage remnants in Halobacterium NRC-1 and archaeal transcriptional regulators. Conclusions: Simultaneous analysis of the association network, coordinated mRNA level changes in microarray experiments and genome-wide structure prediction has allowed us to glean significant biological insights into the roles of several Halobacterium NRC-1 proteins of previously unknown function, and significantly reduce the number of proteins encoded in the genome of this haloarchaeon for which no annotation is available.Keywords
This publication has 77 references indexed in Scilit:
- The Pfam protein families databaseNucleic Acids Research, 2004
- Cytoscape: A Software Environment for Integrated Models of Biomolecular Interaction NetworksGenome Research, 2003
- Transcriptional Regulatory Networks in Saccharomyces cerevisiaeScience, 2002
- A novel mode of sensory transduction in archaea: binding protein-mediated chemotaxis towards osmoprotectants and amino acidsThe EMBO Journal, 2002
- Predictome: a database of putative functional links between proteinsNucleic Acids Research, 2002
- Ab Initio Protein Structure Prediction: Progress and ProspectsAnnual Review of Biophysics, 2001
- A comprehensive two-hybrid analysis to explore the yeast protein interactomeProceedings of the National Academy of Sciences, 2001
- Comparison of sequence profiles. Strategies for structural predictions using sequence informationProtein Science, 2000
- Method for prediction of protein function from sequence using the sequence-to-structure-to-function paradigm with application to Glutaredoxins/Thioredoxins and T 1 Ribonucleases 1 1Edited by F. CohenJournal of Molecular Biology, 1998
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997