The Characterization of Twenty Sequenced Human Genomes
Open Access
- 9 September 2010
- journal article
- research article
- Published by Public Library of Science (PLoS) in PLoS Genetics
- Vol. 6 (9) , e1001111
- https://doi.org/10.1371/journal.pgen.1001111
Abstract
We present the analysis of twenty human genomes to evaluate the prospects for identifying rare functional variants that contribute to a phenotype of interest. We sequenced at high coverage ten “case” genomes from individuals with severe hemophilia A and ten “control” genomes. We summarize the number of genetic variants emerging from a study of this magnitude, and provide a proof of concept for the identification of rare and highly-penetrant functional variants by confirming that the cause of hemophilia A is easily recognizable in this data set. We also show that the number of novel single nucleotide variants (SNVs) discovered per genome seems to stabilize at about 144,000 new variants per genome, after the first 15 individuals have been sequenced. Finally, we find that, on average, each genome carries 165 homozygous protein-truncating or stop loss variants in genes representing a diverse set of pathways. We report here the nearly complete genomic sequence of 20 different individuals, determined using “next-generation” sequencing technologies. We use these data to characterize the type of genetic variation carried by humans in a sample of this size, which is to our knowledge the largest set of unrelated genomic sequences that have been reported. We summarize different categories of variation in each genome, and in total across all 20 of the genomes, finding a surprising number of variants predicted to reduce or remove the proteins encoded by many different genes. This work provides important fundamental information about the scope of human genetic variation, and suggests ways to further explore the relationship between these genetic variants and human disease.Keywords
This publication has 34 references indexed in Scilit:
- Challenges of sequencing human genomesBriefings in Bioinformatics, 2010
- The complete genome of an individual by massively parallel DNA sequencingNature, 2008
- A second generation human haplotype map of over 3.1 million SNPsNature, 2007
- PennCNV: An integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping dataGenome Research, 2007
- The Diploid Genome Sequence of an Individual HumanPLoS Biology, 2007
- Principal components analysis corrects for stratification in genome-wide association studiesNature Genetics, 2006
- The HUGO Gene Nomenclature Database, 2006 updatesNucleic Acids Research, 2006
- Human specific loss of olfactory receptor genesProceedings of the National Academy of Sciences, 2003
- Molecular etiology of factor VIII deficiency in hemophilia AHuman Mutation, 1995
- Characteristic mRNA abnormality found in half the patients with severe haemophilia A is due to large DNA inversionsHuman Molecular Genetics, 1993