Using non-uniform read distribution models to improve isoform expression inference in RNA-Seq
Open Access
- 17 December 2010
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 27 (4) , 502-508
- https://doi.org/10.1093/bioinformatics/btq696
Abstract
Motivation: RNA-Seq technology based on next-generation sequencing provides the unprecedented ability of studying transcriptomes at high resolution and accuracy, and the potential of measuring expression of multiple isoforms from the same gene at high precision. Solved by maximum likelihood estimation, isoform expression can be inferred in RNA-Seq using statistical models based on the assumption that sequenced reads are distributed uniformly along transcripts. Modification of the model is needed when considering situations where RNA-Seq data do not follow uniform distribution. Results: We proposed two curves, the global bias curve (GBC) and the local bias curves (LBCs), to describe the non-uniformity of read distributions for all genes in a transcriptome and for each gene, respectively. Incorporating the bias curves into the uniform read distribution (URD) model, we introduced non-URD (N-URD) models to infer isoform expression levels. On a series of systematic simulation studies, the proposed models outperform the original model in recovering major isoforms and the expression ratio of alternative isoforms. We also applied the new model to real RNA-Seq datasets and found that its inferences on expression ratios of alternative isoforms are more reasonable. The experiments indicate that incorporating N-URD information can improve the accuracy in modeling and inferring isoform expression in RNA-Seq. Contact:zhangxg@tsinghua.edu.cn Supplementary information: Supplementary data are available at Bioinformatics online.Keywords
This publication has 24 references indexed in Scilit:
- Towards reliable isoform quantification using RNA-SEQ dataBMC Bioinformatics, 2010
- Biases in Illumina transcriptome sequencing caused by random hexamer primingNucleic Acids Research, 2010
- Genome‐wide analysis of alternative splicing evolution among Mus subspeciesMolecular Ecology, 2010
- Substantial biases in ultra-short read data sets from high-throughput DNA sequencingNucleic Acids Research, 2008
- Differential expression of alpha-synuclein, parkin, and synphilin-1 isoforms in Lewy body diseaseneurogenetics, 2008
- Alternative splicing and the progesterone receptor in breast cancerBreast Cancer Research, 2008
- Stem cell transcriptome profiling via massive-scale mRNA sequencingNature Methods, 2008
- Parkin and synphilin-1 isoform expression changes in Lewy body diseasesNeurobiology of Disease, 2007
- Overexpression of four-repeat tau mRNA isoforms in progressive supranuclear palsy but not in Alzheimer's diseaseAnnals of Neurology, 1999
- Degradation of mRNA in eukaryotesCell, 1995