De novo SVM classification of precursor microRNAs from genomic pseudo hairpins using global and intrinsic folding measures
Top Cited Papers
- 31 January 2007
- journal article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 23 (11) , 1321-1330
- https://doi.org/10.1093/bioinformatics/btm026
Abstract
MicroRNAs (miRNAs) are small ncRNAs participating in diverse cellular and physiological processes through the post-transcriptional gene regulatory pathway. Critically associated with the miRNAs biogenesis, the hairpin structure is a necessary feature for the computational classification of novel precursor miRNAs (pre-miRs). Though many of the abundant genomic inverted repeats (pseudo hairpins) can be filtered computationally, novel species-specific pre-miRs are likely to remain elusive. miPred is a de novo Support Vector Machine (SVM) classifier for identifying pre-miRs without relying on phylogenetic conservation. To achieve significantly higher sensitivity and specificity than existing (quasi) de novo predictors, it employs a Gaussian Radial Basis Function kernel (RBF) as a similarity measure for 29 global and intrinsic hairpin folding attributes. They characterize a pre-miR at the dinucleotide sequence, hairpin folding, non-linear statistical thermodynamics and topological levels. Trained on 200 human pre-miRs and 400 pseudo hairpins, miPred achieves 93.50% (5-fold cross-validation accuracy) and 0.9833 (ROC score). Tested on the remaining 123 human pre-miRs and 246 pseudo hairpins, it reports 84.55% (sensitivity), 97.97% (specificity) and 93.50% (accuracy). Validated onto 1918 pre-miRs across 40 non-human species and 3836 pseudo hairpins, it yields 87.65% (92.08%), 97.75% (97.42%) and 94.38% (95.64%) for the mean (overall) sensitivity, specificity and accuracy. Notably, A.mellifera, A.geoffroyi, C.familiaris, E.Barr, H. Simplex virus, H.cytomegalovirus, O.aries, P.patens, R.lymphocryptovirus, Simian virus and Z.mays are unambiguously classified with 100.00% (sensitivity) and >93.75% (specificity). Data sets, raw statistical results and source codes are available at http://web.bii.a-star.edu.sg/~stanley/PublicationsKeywords
This publication has 73 references indexed in Scilit:
- Approaches to microRNA discoveryNature Genetics, 2006
- Identification of hundreds of conserved and nonconserved human microRNAsNature Genetics, 2005
- Computational prediction of miRNAs in Arabidopsis thalianaGenome Research, 2005
- Phylogenetic Shadowing and Computational Identification of Human microRNA GenesCell, 2005
- GenBankNucleic Acids Research, 2004
- Detection of 91 potential conserved plant microRNAs in Arabidopsis thaliana and Oryza sativa identifies important target genesProceedings of the National Academy of Sciences, 2004
- Evidence that microRNA precursors, unlike other non-coding RNAs, have lower folding free energies than random sequencesBioinformatics, 2004
- MicroRNAsCell, 2004
- A uniform system for microRNA annotationRNA, 2003
- Phylogenetic Shadowing of Primate Sequences to Find Functional Regions of the Human GenomeScience, 2003