Computational Comparison of Human Genomic Sequence Assemblies for a Region of Chromosome 4
Open Access
- 15 February 2002
- journal article
- research article
- Published by Cold Spring Harbor Laboratory in Genome Research
- Vol. 12 (3) , 424-429
- https://doi.org/10.1101/gr.207902
Abstract
Much of the available human genomic sequence data exist in a fragmentary draft state following the completion of the initial high-volume sequencing performed by the International Human Genome Sequencing Consortium (IHGSC) and Celera Genomics (CG). We compared six draft genome assemblies over a region of chromosome 4p (D4S394–D4S403), two consecutive releases by the IHGSC at University of California, Santa Cruz (UCSC), two consecutive releases from the National Centre for Biotechnology Information (NCBI), the public release from CG, and a hybrid assembly we have produced using IHGSC and CG sequence data. This region presents particular problems for genomic sequence assembly algorithms as it contains a large tandem repeat and is sparsely covered by draft sequences. The six assemblies differed both in terms of their relative coverage of sequence data from the region and in their estimated rates of misassembly. The CG assembly method attained the lowest level of misassembly, whereas NCBI and UCSC assemblies had the highest levels of coverage. All assemblies examined included <60% of the publicly available sequence from the region. At least 6% of the sequence data within the CG assembly for the D4S394–D4S403 region was not present in publicly available sequence data. We also show that even in a problematic region, existing software tools can be used with high-quality mapping data to produce genomic sequence contigs with a low rate of rearrangements.[All sequence accessions for the genomic sequence assemblies analyzed and the data sets used to assess coverage and rates of misassembly are available from http://www.ed.ac.uk/∼csemple.]Keywords
This publication has 24 references indexed in Scilit:
- Whole-genome analysis: annotations and updatesCurrent Opinion in Structural Biology, 2001
- Identification of Genes from a Schizophrenia-Linked Translocation Breakpoint RegionGenomics, 2001
- A High-Resolution Radiation Hybrid Map of the Human Genome Draft SequenceScience, 2001
- A physical map of the human genomeNature, 2001
- A 6.9-Mb High-Resolution BAC/PAC Contig of Human 4p15.3–p16.1, a Candidate Region for Bipolar Affective DisorderGenomics, 2001
- Initial sequencing and analysis of the human genomeNature, 2001
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- A Novel Tandem Repeat Sequence Located on Human Chromosome 4p: Isolation and CharacterizationGenomics, 1997
- SAM: a system for iteratively building marker mapsBioinformatics, 1995
- Transforming a set of biological flat file librariesto a fast access networkBioinformatics, 1993