Natively Unstructured Loops Differ from Other Loops

Open Access

20 July 2007

journal article
research article
Published by Public Library of Science (PLoS) in PLoS Computational Biology

Vol. 3 (7) , e140
https://doi.org/10.1371/journal.pcbi.0030140

Abstract

Natively unstructured or disordered protein regions may increase the functional complexity of an organism; they are particularly abundant in eukaryotes and often evade structure determination. Many computational methods predict unstructured regions by training on outliers in otherwise well-ordered structures. Here, we introduce an approach that uses a neural network in a very different and novel way. We hypothesize that very long contiguous segments with nonregular secondary structure (NORS regions) differ significantly from regular, well-structured loops, and that a method detecting such features could predict natively unstructured regions. Training our new method, NORSnet, on predicted information rather than on experimental data yielded three major advantages: it removed the overlap between testing and training, it systematically covered entire proteomes, and it explicitly focused on one particular aspect of unstructured regions with a simple structural interpretation, namely that they are loops. Our hypothesis was correct: well-structured and unstructured loops differ so substantially that NORSnet succeeded in their distinction. Benchmarks on previously used and new experimental data of unstructured regions revealed that NORSnet performed very well. Although it was not the best single prediction method, NORSnet was sufficiently accurate to flag unstructured regions in proteins that were previously not annotated. In one application, NORSnet revealed previously undetected unstructured regions in putative targets for structural genomics and may thereby contribute to increasing structural coverage of large eukaryotic families. NORSnet found unstructured regions more often in domain boundaries than expected at random. In another application, we estimated that 50%–70% of all worm proteins observed to have more than seven protein–protein interaction partners have unstructured regions. The comparative analysis between NORSnet and DISOPRED2 suggested that long unstructured loops are a major part of unstructured regions in molecular networks. The details of protein structures are important for function. Regions that do not adopt any regular structure in isolation (natively unstructured or disordered regions) initially appeared as a curious exception to this structure–function paradigm. It has become increasingly clear that unstructured regions are fundamental to many roles and that they are particularly important for multicellular organisms. Structural biology is just beginning to apprehend the stunning diversity of these roles. Here, we focused on unstructured regions dominated by a particular type of loop, namely the natively unstructured one. We developed a method that succeeded in the distinction between well-structured and natively unstructured loops. For the development, we did not use any experimental data for unstructured regions; when tested on experimental data, the method performed surprisingly well. Due to its different premises, the method captured very different aspects of unstructured regions than other methods that we tested. We applied the new method to two different problems. The first was the identification of proteins that may be difficult targets for structure determination. The second was the identification of worm proteins that have many interaction partners (more than seven) and unstructured regions. Surprisingly, we found unstructured regions of the loopy type in more than 50% of all the promiscuous worm proteins.

Keywords

This publication has 84 references indexed in Scilit:

Alternative splicing in concert with protein intrinsic disorder enables increased functional diversity in multicellular organisms
Proceedings of the National Academy of Sciences, 2006
Intrinsic Disorder in Transcription Factors
Biochemistry, 2006
Showing your ID: intrinsic disorder as an ID for recognition, regulation and cell signaling
Journal of Molecular Recognition, 2005
IUPred: web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content
Bioinformatics, 2005
Intrinsically unstructured proteins and their functions
Nature Reviews Molecular Cell Biology, 2005
Comparing and Combining Predictors of Mostly Disordered Proteins
Biochemistry, 2005
IntAct: an open source molecular interaction database
Nucleic Acids Research, 2004
Predicting intrinsic disorder from amino acid sequence
Proteins-Structure Function and Bioinformatics, 2003
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
Nucleic Acids Research, 1997
Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features
Biopolymers, 1983