Characterization of missing human genome sequences and copy-number polymorphic insertions
Open Access
- 18 April 2010
- journal article
- research article
- Published by Springer Nature in Nature Methods
- Vol. 7 (5) , 365-371
- https://doi.org/10.1038/nmeth.1451
Abstract
Paired-end sequencing of human genomic DNA reveals at least 2.8 Mb of new sequence at 720 distinct loci. Complete sequencing of 1.67 Mb at 192 loci reveals extensive copy-number variation and provides a resource for genotyping these 'missing' sequences. The extent of human genomic structural variation suggests that there must be portions of the genome yet to be discovered, annotated and characterized at the sequence level. We present a resource and analysis of 2,363 new insertion sequences corresponding to 720 genomic loci. We found that a substantial fraction of these sequences are either missing, fragmented or misassigned when compared to recent de novo sequence assemblies from short-read next-generation sequence data. We determined that 18–37% of these new insertions are copy-number polymorphic, including loci that show extensive population stratification among Europeans, Asians and Africans. Complete sequencing of 156 of these insertions identified new exons and conserved noncoding sequences not yet represented in the reference genome. We developed a method to accurately genotype these new insertions by mapping next-generation sequencing datasets to the breakpoint, thereby providing a means to characterize copy-number status for regions previously inaccessible to single-nucleotide polymorphism microarrays.Keywords
This publication has 27 references indexed in Scilit:
- Origins and functional impact of copy number variation in the human genomeNature, 2009
- Alternative isoform regulation in human tissue transcriptomesNature, 2008
- The diploid genome sequence of an Asian individualNature, 2008
- Mapping and sequencing of structural variation from eight human genomesNature, 2008
- The complete genome of an individual by massively parallel DNA sequencingNature, 2008
- The Fine-Scale and Complex Architecture of Human Copy-Number VariationAmerican Journal of Human Genetics, 2008
- Completing the map of human genetic variationNature, 2007
- Global variation in copy number in the human genomeNature, 2006
- Fine-scale structural variation of the human genomeNature Genetics, 2005
- Finishing the euchromatic sequence of the human genomeNature, 2004