Gene structure conservation aids similarity based gene prediction
- 21 January 2004
- journal article
- Published by Oxford University Press (OUP) in Nucleic Acids Research
- Vol. 32 (2) , 776-783
- https://doi.org/10.1093/nar/gkh211
Abstract
One of the primary tasks in deciphering the functional contents of a newly sequenced genome is the identification of its protein coding genes. Existing computational methods for gene prediction include ab initio methods which use the DNA sequence itself as the only source of information, comparative methods using multiple genomic sequences, and similarity based methods which employ the cDNA or protein sequences of related genes to aid the gene prediction. We present here an algorithm implemented in a computer program called Projector which combines comparative and similarity approaches. Projector employs similarity information at the genomic DNA level by directly using known genes annotated on one DNA sequence to predict the corresponding related genes on another DNA sequence. It therefore makes explicit use of the conservation of the exon-intron structure between two related genes in addition to the similarity of their encoded amino acid sequences. We evaluate the performance of Projector by comparing it with the program Genewise on a test set of 491 pairs of independently confirmed mouse and human genes. It is more accurate than Genewise for genes whose proteins are <80% identical, and is suitable for use in a combined gene prediction system where other methods identify well conserved and non-conserved genes, and pseudogenes.Keywords
This publication has 21 references indexed in Scilit:
- Comparative Genome and Proteome Analysis of Anopheles gambiae and Drosophila melanogasterScience, 2002
- The Ensembl genome database projectNucleic Acids Research, 2002
- Gene recognition in eukaryotic DNA by comparison of genomic sequencesBioinformatics, 2001
- SGP-1: Prediction and Validation of Homologous Genes Based on Sequence AlignmentsGenome Research, 2001
- Integrating genomic homology into gene structure predictionBioinformatics, 2001
- Computational Inference of Homologous Gene Structures in the Human GenomeGenome Research, 2001
- Initial sequencing and analysis of the human genomeNature, 2001
- RefSeq and LocusLink: NCBI gene-centered resourcesNucleic Acids Research, 2001
- An Assessment of Gene Prediction Accuracy in Large DNA SequencesGenome Research, 2000
- Using GeneWise in the Drosophila Annotation ExperimentGenome Research, 2000