SLAM: Cross-Species Gene Finding and Alignment with a Generalized Pair Hidden Markov Model
Open Access
- 12 February 2003
- journal article
- research article
- Published by Cold Spring Harbor Laboratory in Genome Research
- Vol. 13 (3) , 496-502
- https://doi.org/10.1101/gr.424203
Abstract
Comparative-based gene recognition is driven by the principle that conserved regions between related organisms are more likely than divergent regions to be coding. We describe a probabilistic framework for gene structure and alignment that can be used to simultaneously find both the gene structure and alignment of two syntenic genomic regions. A key feature of the method is the ability to enhance gene predictions by finding the best alignment between two syntenic sequences, while at the same time finding biologically meaningful alignments that preserve the correspondence between coding exons. Our probabilistic framework is the generalized pair hidden Markov model, a hybrid of (1) generalized hidden Markov models, which have been used previously for gene finding, and (2) pair hidden Markov models, which have applications to sequence alignment. We have built a gene finding and alignment program called SLAM, which aligns and identifies complete exon/intron structures of genes in two related but unannotated sequences of DNA. SLAM is able to reliably predict gene structures for any suitably related pair of organisms, most notably with fewer false-positive predictions compared to previous methods (examples are provided for Homo sapiens/Mus musculus andPlasmodium falciparum/Plasmodium vivax comparisons). Accuracy is obtained by distinguishing conserved noncoding sequence (CNS) from conserved coding sequence. CNS annotation is a novel feature of SLAM and may be useful for the annotation of UTRs, regulatory elements, and other noncoding features.Keywords
This publication has 26 references indexed in Scilit:
- AVID: A Global Alignment ProgramGenome Research, 2002
- Applications of Generalized Pair Hidden Markov Models to Alignment and Gene Finding ProblemsJournal of Computational Biology, 2002
- SGP-1: Prediction and Validation of Homologous Genes Based on Sequence AlignmentsGenome Research, 2001
- Computational Inference of Homologous Gene Structures in the Human GenomeGenome Research, 2001
- Genie—Gene Finding in Drosophila melanogasterGenome Research, 2000
- Prediction of complete gene structures in human genomic DNAJournal of Molecular Biology, 1997
- Basic Local Alignment Search ToolJournal of Molecular Biology, 1990
- Basic local alignment search toolJournal of Molecular Biology, 1990
- CABIOS EDITORIALBioinformatics, 1985
- A general method applicable to the search for similarities in the amino acid sequence of two proteinsJournal of Molecular Biology, 1970