JIGSAW: integration of multiple sources of evidence for gene prediction
Open Access
- 2 August 2005
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 21 (18) , 3596-3603
- https://doi.org/10.1093/bioinformatics/bti609
Abstract
Motivation: Computational gene finding systems play an important role in finding new human genes, although no systems are yet accurate enough to predict all or even most protein-coding regions perfectly. Ab initio programs can be augmented by evidence such as expression data or protein sequence homology, which improves their performance. The amount of such evidence continues to grow, but computational methods continue to have difficulty predicting genes when the evidence is conflicting or incomplete. Genome annotation pipelines collect a variety of types of evidence about gene structure and synthesize the results, which can then be refined further through manual, expert curation of gene models. Results: JIGSAW is a new gene finding system designed to automate the process of predicting gene structure from multiple sources of evidence, with results that often match the performance of human curators. JIGSAW computes the relative weight of different lines of evidence using statistics generated from a training set, and then combines the evidence using dynamic programming. Our results show that JIGSAW's performance is superior to ab initio gene finding methods and to other pipelines such as Ensembl. Even without evidence from alignment to known genes, JIGSAW can substantially improve gene prediction accuracy as compared with existing methods. Availability: JIGSAW is available as an open source software package at http://cbcb.umd.edu/software/jigsaw Contact:jeallen@umiacs.umd.eduKeywords
This publication has 17 references indexed in Scilit:
- NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteinsNucleic Acids Research, 2004
- The ENCODE (ENCyclopedia Of DNA Elements) ProjectScience, 2004
- TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-findersBioinformatics, 2004
- The Ensembl Automatic Gene Annotation SystemGenome Research, 2004
- Computational Gene Prediction Using Multiple Sources of EvidenceGenome Research, 2004
- Comparative Gene Prediction in Human and MouseGenome Research, 2003
- Leveraging the Mouse Genome for Gene Prediction in Human: From Whole-Genome Shotgun Reads to a Global Synteny MapGenome Research, 2003
- The Sequence of the Human GenomeScience, 2001
- Initial sequencing and analysis of the human genomeNature, 2001
- Prediction of complete gene structures in human genomic DNAJournal of Molecular Biology, 1997