RNA-Seq gene expression estimation with read mapping uncertainty
Top Cited Papers
Open Access
- 18 December 2009
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 26 (4) , 493-500
- https://doi.org/10.1093/bioinformatics/btp692
Abstract
Motivation: RNA-Seq is a promising new technology for accurately measuring gene expression levels. Expression estimation with RNA-Seq requires the mapping of relatively short sequencing reads to a reference genome or transcript set. Because reads are generally shorter than transcripts from which they are derived, a single read may map to multiple genes and isoforms, complicating expression analyses. Previous computational methods either discard reads that map to multiple locations or allocate them to genes heuristically. Results: We present a generative statistical model and associated inference methods that handle read mapping uncertainty in a principled manner. Through simulations parameterized by real RNA-Seq data, we show that our method is more accurate than previous methods. Our improved accuracy is the result of handling read mapping uncertainty with a statistical model and the estimation of gene expression levels as the sum of isoform expression levels. Unlike previous methods, our method is capable of modeling non-uniform read distributions. Simulations with our method indicate that a read length of 20–25 bases is optimal for gene-level expression estimation from mouse and maize RNA-Seq data when sequencing throughput is fixed. Availability: An initial C++ implementation of our method that was used for the results presented in this article is available at http://deweylab.biostat.wisc.edu/rsem. Contact:cdewey@biostat.wisc.edu Supplementary information: Supplementary data are available at Bioinformatics onKeywords
This publication has 16 references indexed in Scilit:
- Ultrafast and memory-efficient alignment of short DNA sequences to the human genomeGenome Biology, 2009
- Statistical inferences for isoform expression in RNA-SeqBioinformatics, 2009
- Cross-hybridization modeling on Affymetrix exon arraysBioinformatics, 2008
- Substantial biases in ultra-short read data sets from high-throughput DNA sequencingNucleic Acids Research, 2008
- Stem cell transcriptome profiling via massive-scale mRNA sequencingNature Methods, 2008
- Highly Integrated Single-Base Resolution Maps of the Epigenome in ArabidopsisCell, 2008
- A rescue strategy for multimapping short sequence tags refines surveys of transcriptional activity by CAGEGenomics, 2008
- Exact Transcriptome Reconstruction from Short Sequence ReadsPublished by Springer Nature ,2008
- The UCSC Known GenesBioinformatics, 2006
- Statistical modeling of sequencing errors in SAGE librariesBioinformatics, 2004