A dictionary model for haplotyping, genotype calling, and association testing
- 8 May 2007
- journal article
- research article
- Published by Wiley in Genetic Epidemiology
- Vol. 31 (7) , 672-683
- https://doi.org/10.1002/gepi.20232
Abstract
We propose a new method for haplotyping, genotype calling, and association testing based on a dictionary model for haplotypes. In this framework, a haplotype arises as a concatenation of conserved haplotype segments, drawn from a predefined dictionary according to segment specific probabilities. The observed data consist of unphased multimarker genotypes gathered on a random sample of unrelated individuals. These genotypes are subject to mutation, genotyping errors, and missing data. The true pair of haplotypes corresponding to a person's multimarker genotype is reconstructed using a Markov chain that visits haplotype pairs according to their posterior probabilities. Our implementation of the chain alternates Gibbs steps, which rearrange the phase of a single marker, and Metropolis steps, which swap maternal and paternal haplotypes from a given maker onward. Output of the chain include the most likely haplotype pairs, the most likely genotypes at each marker, and the expected number of occurrences of each haplotype segment. Reconstruction accuracy is comparable to that achieved by the best existing algorithms. More importantly, the dictionary model yields expected counts of conserved haplotype segments. These imputed counts can serve as genetic predictors in association studies, as we illustrate by examples on cystic fibrosis, Friedreich's ataxia, and angiotensin-I converting enzyme levels. Genet. Epidemiol.Keywords
This publication has 23 references indexed in Scilit:
- Reconstructing Ancestral Haplotypes with a Dictionary ModelJournal of Computational Biology, 2006
- Association testing with MendelGenetic Epidemiology, 2005
- Haplotype Block Partitioning and Tag SNP Selection Using Genotype Data and Their Applications to Association StudiesGenome Research, 2004
- Haplotype reconstruction from genotype data using Imperfect PhylogenyBioinformatics, 2004
- Genomewide motif identification using a dictionary modelProceedings of the IEEE, 2002
- Haplotype Inference in Random Population SamplesAmerican Journal of Human Genetics, 2002
- The Structure of Haplotype Blocks in the Human GenomeScience, 2002
- Bayesian Analysis of Haplotypes for Linkage Disequilibrium MappingGenome Research, 2001
- Fine genetic mapping using haplotype analysis and the missing data problemAnnals of Human Genetics, 1998
- Fine genetic mapping using haplotype analysis and the missing data problemAnnals of Human Genetics, 1998