Development and large scale benchmark testing of the PROSPECTOR_3 threading algorithm

28 April 2004

journal article
research article
Published by Wiley in Proteins-Structure Function and Bioinformatics

Vol. 56 (3) , 502-518
https://doi.org/10.1002/prot.20106

Abstract

This article describes the PROSPECTOR_3 threading algorithm, which combines various scoring functions designed to match structurally related target/template pairs. Each variant described was found to have a Z‐score above which most identified templates have good structural (threading) alignments, Z_struct (Z_good). ‘Easy’ targets with accurate threading alignments are identified as single templates with Z > Z_good or two templates, each with Z > Z_struct, having a good consensus structure in mutually aligned regions. ‘Medium’ targets have a pair of templates lacking a consensus structure, or a single template for which Z_struct < Z < Z_good. PROSPECTOR_3 was applied to a comprehensive Protein Data Bank (PDB) benchmark composed of 1491 single domain proteins, 41–200 residues long and no more than 30% identical to any threading template. Of the proteins, 878 were found to be easy targets, with 761 having a root mean square deviation (RMSD) from native of less than 6.5 Å. The average contact prediction accuracy was 46%, and on average 17.6 residue continuous fragments were predicted with RMSD values of 2.0 Å. There were 606 medium targets identified, 87% (31%) of which had good structural (threading) alignments. On average, 9.1 residue, continuous fragments with RMSD of 2.5 Å were predicted. Combining easy and medium sets, 63% (91%) of the targets had good threading (structural) alignments compared to native; the average target/template sequence identity was 22%. Only nine targets lacked matched templates. Moreover, PROSPECTOR_3 consistently outperforms PSIBLAST. Similar results were predicted for open reading frames (ORFS) ≤200 residues in the M. genitalium, E. coli and S. cerevisiae genomes. Thus, progress has been made in identification of weakly homologous/analogous proteins, with very high alignment coverage, both in a comprehensive PDB benchmark as well as in genomes. Proteins 2004;55:000–000.

Keywords

This publication has 67 references indexed in Scilit:

Functional genomics of pathogenic bacteria
Philosophical Transactions Of The Royal Society B-Biological Sciences, 2002
Protein Structure Prediction and Structural Genomics
Science, 2001
Genomic‐scale comparison of sequence‐ and structure‐based methods of function prediction: Does structure provide additional insight?
Protein Science, 2001
Prospects for ab initio protein structural genomics
Journal of Molecular Biology, 2001
Functional and structural genomics using PEDANT
Bioinformatics, 2001
Proteomics of Mycoplasma genitalium: identification and characterization of unannotated and atypical proteins in a small model genome
Nucleic Acids Research, 2000
Structural genomics and its importance for gene function analysis
Nature Biotechnology, 2000
From genes to protein structure and function: novel applications of computational approaches in the genomic era
Trends in Biotechnology, 2000
MIPS: a database for genomes and protein sequences
Nucleic Acids Research, 2000
Structure-based assignment of the biochemical function of a hypothetical protein: A test case of structural genomics
Proceedings of the National Academy of Sciences, 1998