The Multiassembly Problem: Reconstructing Multiple Transcript Isoforms From EST Fragment Mixtures
Open Access
- 12 February 2004
- journal article
- research article
- Published by Cold Spring Harbor Laboratory in Genome Research
- Vol. 14 (3) , 426-441
- https://doi.org/10.1101/gr.1304504
Abstract
Recent evidence of abundant transcript variation (e.g., alternative splicing, alternative initiation, alternative polyadenylation) in complex genomes indicates that cataloging the complete set of transcripts from an organism is an important project. One challenge is the fact that most high-throughput experimental methods for characterizing transcripts (such as EST sequencing) give highly detailed information about short fragments of transcripts or protein products, instead of a complete characterization of a full-length form. We analyze this “multiassembly problem”—reconstructing the most likely set of full-length isoform sequences from a mixture of EST fragment data—and present a graph-based algorithm for solving it. In a variety of tests, we demonstrate that this algorithm deals appropriately with coupling of distinct alternative splicing events, increasing fragmentation of the input data and different types of transcript variation (such as alternative splicing, initiation, polyadenylation, and intron retention). To test the method's performance on pure fragment (EST) data, we removed all mRNA sequences, and found it produced no errors in 40 cases tested. Using this algorithm, we have constructed an Alternatively Spliced Proteins database (ASP) from analysis of human expressed and genomic sequences, consisting of 13,384 protein isoforms of 4422 genes, yielding an average of 3.0 protein isoforms per gene.Keywords
This publication has 76 references indexed in Scilit:
- Truncated Estrogen Receptor α 46-kDa Isoform in Human Endothelial CellsCirculation, 2003
- Evidence for the widespread coupling of alternative splicing and nonsense-mediated mRNA decay in humansProceedings of the National Academy of Sciences, 2002
- L1 mediated homophilic binding and neurite outgrowth are modulated by alternative splicing of exon 2Journal of Neurobiology, 2002
- Nonsense-mediated mRNA decayCurrent Biology, 2002
- Assembly, Annotation, and Integration of UNIGENE Clusters into the Human Genome DraftGenome Research, 2001
- Initial sequencing and analysis of the human genomeNature, 2001
- Genome-wide analysis of single-nucleotide polymorphisms in human expressed sequencesNature Genetics, 2000
- FHX.L and FHX.S, two isoforms of the human fork-head factor FHX (FOXJ2) with differential activityJournal of Molecular Biology, 2000
- Isolation and Characterization of a Novel Zinc-finger Protein with Transcriptional Repressor ActivityPublished by Elsevier ,1995
- The Factor Binding to the Glucocorticoid Modulatory Element of the Tyrosine Aminotransferase Gene Is a Novel and Ubiquitous Heteromeric ComplexPublished by Elsevier ,1995