Unique folding of precursor microRNAs: Quantitative evidence and implications for de novo identification
- 28 December 2006
- journal article
- Published by Cold Spring Harbor Laboratory in RNA
- Vol. 13 (2) , 170-187
- https://doi.org/10.1261/rna.223807
Abstract
MicroRNAs (miRNAs) participate in diverse cellular and physiological processes through the post-transcriptional gene regulatory pathway. Hairpin is a crucial structural feature for the computational identification of precursor miRNAs (pre-miRs), as its formation is critically associated with the early stages of the mature miRNA biogenesis. Our incomplete knowledge about the number of miRNAs present in the genomes of vertebrates, worms, plants, and even viruses necessitates thorough understanding of their sequence motifs, hairpin structural characteristics, and topological descriptors. In this in-depth study, we investigate a comprehensive and heterogeneous collection of 2241 published (nonredundant) pre-miRs across 41 species (miRBase 8.2), 8494 pseudohairpins extracted from the human RefSeq genes, 12,387 (nonredundant) ncRNAs spanning 457 types (Rfam 7.0), 31 full-length mRNAs randomly selected from GenBank, and four sets of synthetically generated genomic background corresponding to each of the native RNA sequence. Our large-scale characterization analysis reveals that pre-miRs are significantly different from other types of ncRNAs, pseudohairpins, mRNAs, and genomic background according to the nonparametric Kruskal–Wallis ANOVA (p < 0.001). We examine the intrinsic and global features at the sequence, structural, and topological levels including %G+C content, normalized base-pairing propensity P(S), normalized minimum free energy of folding MFE(s), normalized Shannon entropy Q(s), normalized base-pair distance D(s), and degree of compactness F(S), as well as their corresponding Z scores of P(S), MFE(s), Q(s), D(s), and F(S). The findings will promote more accurate guidelines and distinctive criteria for the prediction of novel pre-miRs with improved performance.Keywords
This publication has 104 references indexed in Scilit:
- Approaches to microRNA discoveryNature Genetics, 2006
- The colorectal microRNAomeProceedings of the National Academy of Sciences, 2006
- Identification of hundreds of conserved and nonconserved human microRNAsNature Genetics, 2005
- MicroRNA biogenesis: coordinated cropping and dicingNature Reviews Molecular Cell Biology, 2005
- Identification of microRNAs of the herpesvirus familyNature Methods, 2005
- MicroRNAs Modulate Hematopoietic Lineage DifferentiationScience, 2004
- The UCSC Genome Browser DatabaseNucleic Acids Research, 2003
- Non–coding RNA genes and the modern RNA worldNature Reviews Genetics, 2001
- An Abundant Class of Tiny RNAs with Probable Regulatory Roles in Caenorhabditis elegansScience, 2001
- The Cold Shock Domain Protein LIN-28 Controls Developmental Timing in C. elegans and Is Regulated by the lin-4 RNACell, 1997