Differentiating Protein-Coding and Noncoding RNA: Challenges and Ambiguities
Top Cited Papers
Open Access
- 28 November 2008
- journal article
- review article
- Published by Public Library of Science (PLoS) in PLoS Computational Biology
- Vol. 4 (11) , e1000176
- https://doi.org/10.1371/journal.pcbi.1000176
Abstract
The assumption that RNA can be readily classified into either protein-coding or non-protein–coding categories has pervaded biology for close to 50 years. Until recently, discrimination between these two categories was relatively straightforward: most transcripts were clearly identifiable as protein-coding messenger RNAs (mRNAs), and readily distinguished from the small number of well-characterized non-protein–coding RNAs (ncRNAs), such as transfer, ribosomal, and spliceosomal RNAs. Recent genome-wide studies have revealed the existence of thousands of noncoding transcripts, whose function and significance are unclear. The discovery of this hidden transcriptome and the implicit challenge it presents to our understanding of the expression and regulation of genetic information has made the need to distinguish between mRNAs and ncRNAs both more pressing and more complicated. In this Review, we consider the diverse strategies employed to discriminate between protein-coding and noncoding transcripts and the fundamental difficulties that are inherent in what may superficially appear to be a simple problem. Misannotations can also run in both directions: some ncRNAs may actually encode peptides, and some of those currently thought to do so may not. Moreover, recent studies have shown that some RNAs can function both as mRNAs and intrinsically as functional ncRNAs, which may be a relatively widespread phenomenon. We conclude that it is difficult to annotate an RNA unequivocally as protein-coding or noncoding, with overlapping protein-coding and noncoding transcripts further confounding this distinction. In addition, the finding that some transcripts can function both intrinsically at the RNA level and to encode proteins suggests a false dichotomy between mRNAs and ncRNAs. Therefore, the functionality of any transcript at the RNA level should not be discounted.Keywords
This publication has 75 references indexed in Scilit:
- Pseudogene-derived small interfering RNAs regulate gene expression in mouse oocytesNature, 2008
- A dual function for a bacterial small RNA: SgrS performs base pairing-dependent regulation and encodes a functional polypeptideProceedings of the National Academy of Sciences, 2007
- Distinguishing protein-coding and noncoding genes in the human genomeProceedings of the National Academy of Sciences, 2007
- CPC: assess the protein-coding potential of transcripts using sequence features and support vector machineNucleic Acids Research, 2007
- Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot projectNature, 2007
- Genome-wide transcription and the implications for genomic organizationNature Reviews Genetics, 2007
- Biological function of unannotated transcription during the early development of Drosophila melanogasterNature Genetics, 2006
- Finishing the euchromatic sequence of the human genomeNature, 2004
- Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAsNature, 2002
- Identification of protein coding regions by database similarity searchNature Genetics, 1993