Combining multi-species genomic data for microRNA identification using a Naïve Bayes classifier
- 16 March 2006
- journal article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 22 (11) , 1325-1334
- https://doi.org/10.1093/bioinformatics/btl094
Abstract
Most computational methodologies for microRNA gene prediction utilize techniques based on sequence conservation and/or structural similarity. In this study we describe a new technique, which is applicable across several species, for predicting miRNA genes. This technique is based on machine learning, using the Naive Bayes classifier. It automatically generates a model from the training data, which consists of sequence and structure information of known miRNAs from a variety of species. Our study shows that the application of machine learning techniques, along with the integration of data from multiple species is a useful and general approach for miRNA gene prediction. Based on our experiments, we believe that this new technique is applicable to an extensive range of eukaryotes' genomes. Specific structure and sequence features are first used to identify miRNAs followed by a comparative analysis to decrease the number of false positives (FPs). The resulting algorithm exhibits higher specificity and similar sensitivity compared to currently used algorithms that rely on conserved genomic regions to decrease the rate of FPs.Keywords
This publication has 23 references indexed in Scilit:
- Systematic discovery of regulatory motifs in human promoters and 3′ UTRs by comparison of several mammalsNature, 2005
- MicroRNAsCell, 2004
- The microRNAs ofCaenorhabditis elegansGenes & Development, 2003
- A uniform system for microRNA annotationRNA, 2003
- The Human Genome Browser at UCSCGenome Research, 2002
- An Abundant Class of Tiny RNAs with Probable Regulatory Roles in Caenorhabditis elegansScience, 2001
- An Extensive Class of Small RNAs in Caenorhabditis elegansScience, 2001
- Identification of Novel Genes Coding for Small Expressed RNAsScience, 2001
- Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structureJournal of Molecular Biology, 1999
- Basic principles of ROC analysisSeminars in Nuclear Medicine, 1978