Heterochromatic sequences in a Drosophila whole-genome shotgun assembly
Open Access
- 31 December 2002
- journal article
- research article
- Published by Springer Nature in Genome Biology
Abstract
Most eukaryotic genomes include a substantial repeat-rich fraction termed heterochromatin, which is concentrated in centric and telomeric regions. The repetitive nature of heterochromatic sequence makes it difficult to assemble and analyze. To better understand the heterochromatic component of the Drosophila melanogaster genome, we characterized and annotated portions of a whole-genome shotgun sequence assembly. WGS3, an improved whole-genome shotgun assembly, includes 20.7 Mb of draft-quality sequence not represented in the Release 3 sequence spanning the euchromatin. We annotated this sequence using the methods employed in the re-annotation of the Release 3 euchromatic sequence. This analysis predicted 297 protein-coding genes and six non-protein-coding genes, including known heterochromatic genes, and regions of similarity to known transposable elements. Bacterial artificial chromosome (BAC)-based fluorescence in situ hybridization analysis was used to correlate the genomic sequence with the cytogenetic map in order to refine the genomic definition of the centric heterochromatin; on the basis of our cytological definition, the annotated Release 3 euchromatic sequence extends into the centric heterochromatin on each chromosome arm. Whole-genome shotgun assembly produced a reliable draft-quality sequence of a significant part of the Drosophila heterochromatin. Annotation of this sequence defined the intron-exon structures of 30 known protein-coding genes and 267 protein-coding gene models. The cytogenetic mapping suggests that an additional 150 predicted genes are located in heterochromatin at the base of the Release 3 euchromatic sequence. Our analysis suggests strategies for improving the sequence and annotation of the heterochromatic portions of the Drosophila and other complex genomes.Keywords
This publication has 85 references indexed in Scilit:
- Sequence Analysis of a Functional Drosophila CentromereGenome Research, 2003
- Genomic and Genetic Definition of a Functional Human CentromereScience, 2001
- Centromere identity in Drosophila is not determined in vivo by replication timingThe Journal of cell biology, 2001
- The CENTROMERE1 (CEN1) region of Arabidopsis thaliana: architecture and functional impact of chromatinThe Plant Journal, 2001
- The Genome Sequence of Drosophila melanogasterScience, 2000
- Molecular structure and evolution of an alpha satellite/non-alpha satellite junction at 16p11Human Molecular Genetics, 2000
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- Prediction of probable genes by Fourier analysis of genomic sequencesBioinformatics, 1997
- FUNCTIONAL ELEMENTS IN DROSOPHILA MELANOGASTER HETEROCHROMATINAnnual Review of Genetics, 1992
- Transcription of a satellite DNA on twoY chromosome loops ofDrosophila melanogasterChromosoma, 1990