De novo assembly and genotyping of variants using colored de Bruijn graphs
Top Cited Papers
Open Access
- 8 January 2012
- journal article
- research article
- Published by Springer Nature in Nature Genetics
- Vol. 44 (2) , 226-232
- https://doi.org/10.1038/ng.1028
Abstract
Gil McVean and colleagues report algorithms for de novo assembly and genotyping of variants using colored de Bruijn graphs and provide these in a software implementation called Cortex. Their methods can detect and genotype both simple and complex genetic variants in either an individual or a population. Detecting genetic variants that are highly divergent from a reference sequence remains a major challenge in genome sequencing. We introduce de novo assembly algorithms using colored de Bruijn graphs for detecting and genotyping simple and complex genetic variants in an individual or population. We provide an efficient software implementation, Cortex, the first de novo assembler capable of assembling multiple eukaryotic genomes simultaneously. Four applications of Cortex are presented. First, we detect and validate both simple and complex structural variations in a high-coverage human genome. Second, we identify more than 3 Mb of sequence absent from the human reference genome, in pooled low-coverage population sequence data from the 1000 Genomes Project. Third, we show how population information from ten chimpanzees enables accurate variant calls without a reference sequence. Last, we estimate classical human leukocyte antigen (HLA) genotypes at HLA-B, the most variable gene in the human genome.Keywords
This publication has 53 references indexed in Scilit:
- Mapping copy number variation by population-scale genome sequencingNature, 2011
- A Human Genome Structural Variation Sequencing Resource Reveals Insights into Mutational MechanismsCell, 2010
- A map of human genome variation from population-scale sequencingNature, 2010
- Accurate whole human genome sequencing using reversible terminator chemistryNature, 2008
- The diploid genome sequence of an Asian individualNature, 2008
- Mapping and sequencing of structural variation from eight human genomesNature, 2008
- A second generation human haplotype map of over 3.1 million SNPsNature, 2007
- Global variation in copy number in the human genomeNature, 2006
- A high-resolution HLA and SNP haplotype map for disease association studies in the extended human MHCNature Genetics, 2006
- Fine-scale structural variation of the human genomeNature Genetics, 2005