Improving the performance of DomainDiscovery of protein domain boundary assignment using inter-domain linker index
Open Access
- 18 December 2006
- journal article
- research article
- Published by Springer Nature in BMC Bioinformatics
- Vol. 7 (S5) , S6
- https://doi.org/10.1186/1471-2105-7-s5-s6
Abstract
Knowledge of protein domain boundaries is critical for the characterisation and understanding of protein function. The ability to identify domains without the knowledge of the structure – by using sequence information only – is an essential step in many types of protein analyses. In this present study, we demonstrate that the performance of DomainDiscovery is improved significantly by including the inter-domain linker index value for domain identification from sequence-based information. Improved DomainDiscovery uses a Support Vector Machine (SVM) approach and a unique training dataset built on the principle of consensus among experts in defining domains in protein structure. The SVM was trained using a PSSM (Position Specific Scoring Matrix), secondary structure, solvent accessibility information and inter-domain linker index to detect possible domain boundaries for a target sequence.Keywords
This publication has 22 references indexed in Scilit:
- Partitioning Protein Structures into Domains: Why Is it so Difficult?Journal of Molecular Biology, 2006
- SSEP-Domain: protein domain prediction by alignment of secondary structure elements and profilesBioinformatics, 2005
- Armadillo: Domain Boundary Prediction by Amino Acid CompositionJournal of Molecular Biology, 2005
- PPRODO: Prediction of protein domain boundaries using neural networksProteins-Structure Function and Bioinformatics, 2005
- Automatic prediction of protein domains from sequence information using a hybrid learning systemBioinformatics, 2004
- Prediction of protein domain boundaries from sequence aloneProtein Science, 2003
- Rapid protein domain assignment from amino acid sequence using predicted secondary structureProtein Science, 2002
- SnapDRAGON: a method to delineate protein structural domains from sequence dataJournal of Molecular Biology, 2002
- The Protein Data BankNucleic Acids Research, 2000
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997