How significant is a protein structure similarity with TM-score = 0.5?
Top Cited Papers
Open Access
- 17 February 2010
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 26 (7) , 889-895
- https://doi.org/10.1093/bioinformatics/btq066
Abstract
Motivation: Protein structure similarity is often measured by root mean squared deviation, global distance test score and template modeling score (TM-score). However, the scores themselves cannot provide information on how significant the structural similarity is. Also, it lacks a quantitative relation between the scores and conventional fold classifications. This article aims to answer two questions: (i) what is the statistical significance of TM-score? (ii) What is the probability of two proteins having the same fold given a specific TM-score? Results: We first made an all-to-all gapless structural match on 6684 non-homologous single-domain proteins in the PDB and found that the TM-scores follow an extreme value distribution. The data allow us to assign each TM-score a P-value that measures the chance of two randomly selected proteins obtaining an equal or higher TM-score. With a TM-score at 0.5, for instance, its P-value is 5.5 × 10−7, which means we need to consider at least 1.8 million random protein pairs to acquire a TM-score of no less than 0.5. Second, we examine the posterior probability of the same fold proteins from three datasets SCOP, CATH and the consensus of SCOP and CATH. It is found that the posterior probability from different datasets has a similar rapid phase transition around TM-score=0.5. This finding indicates that TM-score can be used as an approximate but quantitative criterion for protein topology classification, i.e. protein pairs with a TM-score >0.5 are mostly in the same fold while those with a TM-score Contact:zhng@umich.edu Supplementary information: Supplementary data are available at Bioinformatics online.Keywords
This publication has 37 references indexed in Scilit:
- Quantifying the evolutionary divergence of protein structures: The role of function change and function conservationProteins-Structure Function and Bioinformatics, 2009
- Discrete–continuous duality of protein structure spaceCurrent Opinion in Structural Biology, 2009
- Protein structure prediction: when is it useful?Current Opinion in Structural Biology, 2009
- The CATH classification revisited--architectures reviewed and new ways to characterize structural divergence in superfamiliesNucleic Acids Research, 2008
- Data growth and its impact on the SCOP database: new developmentsNucleic Acids Research, 2007
- Critical assessment of methods of protein structure prediction—Round VIIProteins-Structure Function and Bioinformatics, 2007
- SCOP: A structural classification of proteins database for the investigation of sequences and structuresPublished by Elsevier ,2006
- The Protein Data BankActa Crystallographica Section D-Biological Crystallography, 2002
- Protein family and fold occurrence in genomes: power-law behaviour and evolutionary modelJournal of Molecular Biology, 2001
- CATH – a hierarchic classification of protein domain structuresPublished by Elsevier ,1997