A machine learning approach for the identification of odorant binding proteins from sequence-derived properties
Open Access
- 19 September 2007
- journal article
- Published by Springer Nature in BMC Bioinformatics
- Vol. 8 (1) , 351
- https://doi.org/10.1186/1471-2105-8-351
Abstract
Background: Odorant binding proteins (OBPs) are believed to shuttle odorants from the environment to the underlying odorant receptors, for which they could potentially serve as odorant presenters. Although several sequence based search methods have been exploited for protein family prediction, less effort has been devoted to the prediction of OBPs from sequence data and this area is more challenging due to poor sequence identity between these proteins. Results: In this paper, we propose a new algorithm that uses Regularized Least Squares Classifier (RLSC) in conjunction with multiple physicochemical properties of amino acids to predict odorant-binding proteins. The algorithm was applied to the dataset derived from Pfam and GenDiS database and we obtained overall prediction accuracy of 97.7% (94.5% and 98.4% for positive and negative classes respectively). Conclusion: Our study suggests that RLSC is potentially useful for predicting the odorant binding proteins from sequence-derived properties irrespective of sequence similarity. Our method predicts 92.8% of 56 odorant binding proteins non-homologous to any protein in the swissprot database and 97.1% of the 414 independent dataset proteins, suggesting the usefulness of RLSC method for facilitating the prediction of odorant binding proteins from sequence information.Keywords
This publication has 44 references indexed in Scilit:
- Signal-CF: A subsite-coupled and window-fusing approach for predicting signal peptidesBiochemical and Biophysical Research Communications, 2007
- Hum-mPLoc: An ensemble classifier for large-scale human protein subcellular location prediction by incorporating samples with multiple sitesBiochemical and Biophysical Research Communications, 2007
- Hum-PLoc: A novel ensemble classifier for predicting human protein subcellular localizationBiochemical and Biophysical Research Communications, 2006
- Will my protein crystallize? A sequence‐based predictorProteins-Structure Function and Bioinformatics, 2005
- Prediction of protein cellular attributes using pseudo‐amino acid compositionProteins-Structure Function and Bioinformatics, 2001
- Separation, Characterization and Sexual Heterogeneity of Multiple Putative Odorant-binding Proteins in the Honeybee Apis mellifera L. (Hymenoptera: Apidea)Chemical Senses, 1998
- Wrappers for feature subset selectionArtificial Intelligence, 1997
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- Prediction of Protein Structural ClassesCritical Reviews in Biochemistry and Molecular Biology, 1995
- A novel multigene family may encode odorant receptors: A molecular basis for odor recognitionCell, 1991