Sequencing delivers diminishing returns for homology detection: implications for mapping the protein universe

Abstract
Motivation: Databases of sequenced genomes are widely used to characterize the structure, function and evolutionary relationships of proteins. The ability to discern such relationships is widely expected to grow as sequencing projects provide novel information, bridging gaps in our map of the protein universe. Results: We have plotted our progress in protein sequencing over the last two decades and found that the rate of novel sequence discovery is in a sustained period of decline. Consequently, PSI-BLAST, the most widely used method to detect remote evolutionary relationships, which relies upon the accumulation of novel sequence data, is now showing a plateau in performance. We interpret this trend as signalling our approach to a representative map of the protein universe and discuss its implications. Contact:daniel.chubb01@imperial.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online.