Assessing the gene space in draft genomes
Top Cited Papers
Open Access
- 28 November 2008
- journal article
- research article
- Published by Oxford University Press (OUP) in Nucleic Acids Research
- Vol. 37 (1) , 289-297
- https://doi.org/10.1093/nar/gkn916
Abstract
Genome sequencing projects have been initiated for a wide range of eukaryotes. A few projects have reached completion, but most exist as draft assemblies. As one of the main reasons to sequence a genome is to obtain its catalog of genes, an important question is how complete or completable the catalog is in unfinished genomes. To answer this question, we have identified a set of core eukaryotic genes (CEGs), that are extremely highly conserved and which we believe are present in low copy numbers in higher eukaryotes. From an analysis of a phylogenetically diverse set of eukaryotic genome assemblies, we found that the proportion of CEGs mapped in draft genomes provides a useful metric for describing the gene space, and complements the commonly used N50 length and x-fold coverage values.Keywords
This publication has 23 references indexed in Scilit:
- CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomesBioinformatics, 2007
- Eukaryotic genome size databasesNucleic Acids Research, 2006
- Genome sequence, comparative analysis and haplotype structure of the domestic dogNature, 2005
- Assembly of polymorphic genomes: Algorithms and application to Ciona savignyiGenome Research, 2005
- Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolutionNature, 2004
- The Genome Sequence of Caenorhabditis briggsae: A Platform for Comparative GenomicsPLoS Biology, 2003
- Comparative Gene Prediction in Human and MouseGenome Research, 2003
- The Phusion AssemblerGenome Research, 2002
- The Genome Sequence of Drosophila melanogasterScience, 2000
- Profile hidden Markov models.Bioinformatics, 1998