Prediction of lysine ubiquitination with mRMR feature selection and analysis
- 26 January 2011
- journal article
- research article
- Published by Springer Nature in Amino Acids
- Vol. 42 (4) , 1387-1395
- https://doi.org/10.1007/s00726-011-0835-0
Abstract
Ubiquitination, one of the most important post-translational modifications of proteins, occurs when ubiquitin (a small 76-amino acid protein) is attached to lysine on a target protein. It often commits the labeled protein to degradation and plays important roles in regulating many cellular processes implicated in a variety of diseases. Since ubiquitination is rapid and reversible, it is time-consuming and labor-intensive to identify ubiquitination sites using conventional experimental approaches. To efficiently discover lysine-ubiquitination sites, a sequence-based predictor of ubiquitination site was developed based on nearest neighbor algorithm. We used the maximum relevance and minimum redundancy principle to identify the key features and the incremental feature selection procedure to optimize the prediction engine. PSSM conservation scores, amino acid factors and disorder scores of the surrounding sequence formed the optimized 456 features. The Mathew’s correlation coefficient (MCC) of our ubiquitination site predictor achieved 0.142 by jackknife cross-validation test on a large benchmark dataset. In independent test, the MCC of our method was 0.139, higher than the existing ubiquitination site predictor UbiPred and UbPred. The MCCs of UbiPred and UbPred on the same test set were 0.135 and 0.117, respectively. Our analysis shows that the conservation of amino acids at and around lysine plays an important role in ubiquitination site prediction. What’s more, disorder and ubiquitination have a strong relevance. These findings might provide useful insights for studying the mechanisms of ubiquitination and modulating the ubiquitination pathway, potentially leading to potential therapeutic strategies in the future.Keywords
This publication has 44 references indexed in Scilit:
- SysPTM: A Systematic Resource for Proteomic Research on Post-translational ModificationsMolecular & Cellular Proteomics, 2009
- Identification, analysis, and prediction of protein ubiquitination sitesProteins-Structure Function and Bioinformatics, 2009
- Fast calculation of pairwise mutual information for gene regulatory network reconstructionComputer Methods and Programs in Biomedicine, 2009
- Developing and validating predictive decision tree models from mining chemical structural fingerprints and high–throughput screening data in PubChemBMC Bioinformatics, 2008
- Computational identification of ubiquitylation sites from protein sequencesBMC Bioinformatics, 2008
- DisProt: the Database of Disordered ProteinsNucleic Acids Research, 2006
- Structure of a β-TrCP1-Skp1-β-Catenin ComplexMolecular Cell, 2003
- Structure of the Cul1–Rbx1–Skp1–F boxSkp2 SCF ubiquitin ligase complexNature, 2002
- Structure of an E6AP-UbcH7 Complex: Insights into Ubiquitination by the E2-E3 Enzyme CascadeScience, 1999
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997