Finding genes in Schistosoma japonicum: annotating novel genomes with help of extrinsic evidence
Open Access
- 5 March 2009
- journal article
- research article
- Published by Oxford University Press (OUP) in Nucleic Acids Research
- Vol. 37 (7) , e52
- https://doi.org/10.1093/nar/gkp052
Abstract
We have developed a novel method for estimating the parameters of hidden Markov models for gene finding in newly sequenced species. Our approach does not rely on curated training data sets, but instead uses extrinsic evidence (including paired-end ditags that have not been used in gene finding previously) and iterative training. This new method is particularly suitable for annotation of species with large evolutionary distance to the closest annotated species. We have used our approach to produce an initial annotation of more than 16 000 genes in the newly sequenced Schistosoma japonicum draft genome. We established the high quality of our predictions by comparison to full-length cDNAs (withdrawn from the extrinsic evidence) and to CEGMA core genes. We also evaluated the effectiveness of the new training procedure on Caenorhabditis elegans genome. ExonHunter and the newest parametric files for S. japonicum genome are available for download at www.bioinformatics.uwaterloo.ca/downloads/exonhunterKeywords
This publication has 24 references indexed in Scilit:
- MAKER: An easy-to-use annotation pipeline designed for emerging model organism genomesGenome Research, 2007
- Identification of core promoter modules in Drosophila and their application in accurate transcription start site predictionNucleic Acids Research, 2006
- PET-Tool: a software suite for comprehensive processing and managing of Paired-End diTag (PET) sequence dataBMC Bioinformatics, 2006
- MUSCLE: a multiple sequence alignment method with reduced time and space complexityBMC Bioinformatics, 2004
- Gene finding in novel genomesBMC Bioinformatics, 2004
- A Simple, Fast, and Accurate Algorithm to Estimate Large Phylogenies by Maximum LikelihoodSystematic Biology, 2003
- Gene prediction with a hidden Markov model and a new intron submodelBioinformatics, 2003
- The COG database: an updated version includes eukaryotesBMC Bioinformatics, 2003
- BLAT—The BLAST-Like Alignment ToolGenome Research, 2002
- Basic local alignment search toolJournal of Molecular Biology, 1990