Improving the performance of DomainDiscovery of protein domain boundary assignment using inter-domain linker index

Open Access

18 December 2006

journal article
research article
Published by Springer Nature in BMC Bioinformatics

Vol. 7 (S5) , S6
https://doi.org/10.1186/1471-2105-7-s5-s6

Abstract

Knowledge of protein domain boundaries is critical for the characterisation and understanding of protein function. The ability to identify domains without the knowledge of the structure – by using sequence information only – is an essential step in many types of protein analyses. In this present study, we demonstrate that the performance of DomainDiscovery is improved significantly by including the inter-domain linker index value for domain identification from sequence-based information. Improved DomainDiscovery uses a Support Vector Machine (SVM) approach and a unique training dataset built on the principle of consensus among experts in defining domains in protein structure. The SVM was trained using a PSSM (Position Specific Scoring Matrix), secondary structure, solvent accessibility information and inter-domain linker index to detect possible domain boundaries for a target sequence.

Keywords

This publication has 22 references indexed in Scilit:

Partitioning Protein Structures into Domains: Why Is it so Difficult?
Journal of Molecular Biology, 2006
SSEP-Domain: protein domain prediction by alignment of secondary structure elements and profiles
Bioinformatics, 2005
Armadillo: Domain Boundary Prediction by Amino Acid Composition
Journal of Molecular Biology, 2005
PPRODO: Prediction of protein domain boundaries using neural networks
Proteins-Structure Function and Bioinformatics, 2005
Automatic prediction of protein domains from sequence information using a hybrid learning system
Bioinformatics, 2004
Prediction of protein domain boundaries from sequence alone
Protein Science, 2003
Rapid protein domain assignment from amino acid sequence using predicted secondary structure
Protein Science, 2002
SnapDRAGON: a method to delineate protein structural domains from sequence data
Journal of Molecular Biology, 2002
The Protein Data Bank
Nucleic Acids Research, 2000
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
Nucleic Acids Research, 1997