Sequencing delivers diminishing returns for homology detection: implications for mapping the protein universe
Open Access
- 15 September 2010
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 26 (21) , 2664-2671
- https://doi.org/10.1093/bioinformatics/btq527
Abstract
Motivation: Databases of sequenced genomes are widely used to characterize the structure, function and evolutionary relationships of proteins. The ability to discern such relationships is widely expected to grow as sequencing projects provide novel information, bridging gaps in our map of the protein universe. Results: We have plotted our progress in protein sequencing over the last two decades and found that the rate of novel sequence discovery is in a sustained period of decline. Consequently, PSI-BLAST, the most widely used method to detect remote evolutionary relationships, which relies upon the accumulation of novel sequence data, is now showing a plateau in performance. We interpret this trend as signalling our approach to a representative map of the protein universe and discuss its implications. Contact:daniel.chubb01@imperial.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online.Keywords
This publication has 35 references indexed in Scilit:
- Low-homology protein threadingBioinformatics, 2010
- Nature of the protein universeProceedings of the National Academy of Sciences, 2009
- Discrete–continuous duality of protein structure spaceCurrent Opinion in Structural Biology, 2009
- Protein structure prediction on the Web: a case study using the Phyre serverNature Protocols, 2009
- The Universal Protein Resource (UniProt) 2009Nucleic Acids Research, 2009
- Probing Metagenomics by Rapid Cluster Analysis of Very Large DatasetsPLOS ONE, 2008
- The Genomes On Line Database (GOLD) in 2007: status of genomic and metagenomic projects and their associated metadataNucleic Acids Research, 2007
- UniRef: comprehensive and non-redundant UniProt reference clustersBioinformatics, 2007
- The Sorcerer II Global Ocean Sampling Expedition: Expanding the Universe of Protein FamiliesPLoS Biology, 2007
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997