Development and large scale benchmark testing of the PROSPECTOR_3 threading algorithm
- 28 April 2004
- journal article
- research article
- Published by Wiley in Proteins-Structure Function and Bioinformatics
- Vol. 56 (3) , 502-518
- https://doi.org/10.1002/prot.20106
Abstract
This article describes the PROSPECTOR_3 threading algorithm, which combines various scoring functions designed to match structurally related target/template pairs. Each variant described was found to have a Z‐score above which most identified templates have good structural (threading) alignments, Zstruct (Zgood). ‘Easy’ targets with accurate threading alignments are identified as single templates with Z > Zgood or two templates, each with Z > Zstruct, having a good consensus structure in mutually aligned regions. ‘Medium’ targets have a pair of templates lacking a consensus structure, or a single template for which Zstruct < Z < Zgood. PROSPECTOR_3 was applied to a comprehensive Protein Data Bank (PDB) benchmark composed of 1491 single domain proteins, 41–200 residues long and no more than 30% identical to any threading template. Of the proteins, 878 were found to be easy targets, with 761 having a root mean square deviation (RMSD) from native of less than 6.5 Å. The average contact prediction accuracy was 46%, and on average 17.6 residue continuous fragments were predicted with RMSD values of 2.0 Å. There were 606 medium targets identified, 87% (31%) of which had good structural (threading) alignments. On average, 9.1 residue, continuous fragments with RMSD of 2.5 Å were predicted. Combining easy and medium sets, 63% (91%) of the targets had good threading (structural) alignments compared to native; the average target/template sequence identity was 22%. Only nine targets lacked matched templates. Moreover, PROSPECTOR_3 consistently outperforms PSIBLAST. Similar results were predicted for open reading frames (ORFS) ≤200 residues in the M. genitalium, E. coli and S. cerevisiae genomes. Thus, progress has been made in identification of weakly homologous/analogous proteins, with very high alignment coverage, both in a comprehensive PDB benchmark as well as in genomes. Proteins 2004;55:000–000.Keywords
This publication has 67 references indexed in Scilit:
- Functional genomics of pathogenic bacteriaPhilosophical Transactions Of The Royal Society B-Biological Sciences, 2002
- Protein Structure Prediction and Structural GenomicsScience, 2001
- Genomic‐scale comparison of sequence‐ and structure‐based methods of function prediction: Does structure provide additional insight?Protein Science, 2001
- Prospects for ab initio protein structural genomicsJournal of Molecular Biology, 2001
- Functional and structural genomics using PEDANTBioinformatics, 2001
- Proteomics of Mycoplasma genitalium: identification and characterization of unannotated and atypical proteins in a small model genomeNucleic Acids Research, 2000
- Structural genomics and its importance for gene function analysisNature Biotechnology, 2000
- From genes to protein structure and function: novel applications of computational approaches in the genomic eraTrends in Biotechnology, 2000
- MIPS: a database for genomes and protein sequencesNucleic Acids Research, 2000
- Structure-based assignment of the biochemical function of a hypothetical protein: A test case of structural genomicsProceedings of the National Academy of Sciences, 1998