How significant is a protein structure similarity with TM-score = 0.5?

Top Cited Papers

Open Access

17 February 2010

journal article
research article
Published by Oxford University Press (OUP) in Bioinformatics

Vol. 26 (7) , 889-895
https://doi.org/10.1093/bioinformatics/btq066

Abstract

Motivation: Protein structure similarity is often measured by root mean squared deviation, global distance test score and template modeling score (TM-score). However, the scores themselves cannot provide information on how significant the structural similarity is. Also, it lacks a quantitative relation between the scores and conventional fold classifications. This article aims to answer two questions: (i) what is the statistical significance of TM-score? (ii) What is the probability of two proteins having the same fold given a specific TM-score? Results: We first made an all-to-all gapless structural match on 6684 non-homologous single-domain proteins in the PDB and found that the TM-scores follow an extreme value distribution. The data allow us to assign each TM-score a P-value that measures the chance of two randomly selected proteins obtaining an equal or higher TM-score. With a TM-score at 0.5, for instance, its P-value is 5.5 × 10⁻⁷, which means we need to consider at least 1.8 million random protein pairs to acquire a TM-score of no less than 0.5. Second, we examine the posterior probability of the same fold proteins from three datasets SCOP, CATH and the consensus of SCOP and CATH. It is found that the posterior probability from different datasets has a similar rapid phase transition around TM-score=0.5. This finding indicates that TM-score can be used as an approximate but quantitative criterion for protein topology classification, i.e. protein pairs with a TM-score >0.5 are mostly in the same fold while those with a TM-score Contact:zhng@umich.edu Supplementary information: Supplementary data are available at Bioinformatics online.

Keywords

This publication has 37 references indexed in Scilit:

Quantifying the evolutionary divergence of protein structures: The role of function change and function conservation
Proteins-Structure Function and Bioinformatics, 2009
Discrete–continuous duality of protein structure space
Current Opinion in Structural Biology, 2009
Protein structure prediction: when is it useful?
Current Opinion in Structural Biology, 2009
The CATH classification revisited--architectures reviewed and new ways to characterize structural divergence in superfamilies
Nucleic Acids Research, 2008
Data growth and its impact on the SCOP database: new developments
Nucleic Acids Research, 2007
Critical assessment of methods of protein structure prediction—Round VII
Proteins-Structure Function and Bioinformatics, 2007
SCOP: A structural classification of proteins database for the investigation of sequences and structures
Published by Elsevier ,2006
The Protein Data Bank
Acta Crystallographica Section D-Biological Crystallography, 2002
Protein family and fold occurrence in genomes: power-law behaviour and evolutionary model
Journal of Molecular Biology, 2001
CATH – a hierarchic classification of protein domain structures
Published by Elsevier ,1997