Haplotyping as perfect phylogeny
- 18 April 2002
- proceedings article
- Published by Association for Computing Machinery (ACM)
- p. 166-175
- https://doi.org/10.1145/565196.565218
Abstract
The next high-priority phase of human genomics will involve the development of a full Haplotype Map of the human genome [12]. It will be used in large-scale screens of populations to associate specific haplotypes with specific complex genetic-influenced diseases. A prototype Haplotype Mapping strategy is presently being finalized by an NIH working-group. The biological key to that strategy is the surprising fact that genomic DNA can be partitioned into long blocks where genetic recombination has been rare, leading to strikingly fewer distinct haplotypes in the population than previously expected [12, 6, 21, 7].In this paper we explore the algorithmic implications of the key (and now realistic) "no-recombination in long blocks" observation, for the problem of inferring haplotypes in populations. We observe that the no-recombination assumption is very powerful. This assumption, along with the standard population-genetic assumption of infinite sites [23, 14] imposes severe combinatorial constraints on the permitted solutions to the haplotype inference problem, leading to an efficient deterministic algorithm to deduce all features of the permitted haplotype solution(s) that can be known with certainty. The technical key is to view haplotype data as disguised information about paths in an unknown tree, and the haplotype deduction problem as a problem of reconstructing the tree from that path information. This formulation allows us to exploit deep theorems and algorithms from graph and matroid theory to efficiently find one permitted solution to the haplotype problem; it gives a simple test to determine if it is the unique solution; if not, we can implicitly represent the set of all permitted solutions so that each can be efficiently created.Keywords
This publication has 19 references indexed in Scilit:
- Gene Conversion and Different Population Histories May Explain the Contrast between Polymorphism and Linkage Disequilibrium LevelsAmerican Journal of Human Genetics, 2001
- Map of the Human Genome 3.0Science, 2001
- Haplotype Variation and Linkage Disequilibrium in 313 Human GenesScience, 2001
- Inference of Haplotypes from Samples of Diploid Populations: Complexity and AlgorithmsJournal of Computational Biology, 2001
- A New Statistical Method for Haplotype Reconstruction from Population DataAmerican Journal of Human Genetics, 2001
- Apolipoprotein E Variation at the Sequence Haplotype Level: Implications for the Origin and Maintenance of a Major Human PolymorphismAmerican Journal of Human Genetics, 2000
- Haplotype Structure and Population Genetic Inferences from Nucleotide-Sequence Variation in Human Lipoprotein LipaseAmerican Journal of Human Genetics, 1998
- Efficient algorithms for inferring evolutionary treesNetworks, 1991
- A Combinatorial Decomposition TheoryCanadian Journal of Mathematics, 1980
- Congruent Graphs and the Connectivity of GraphsAmerican Journal of Mathematics, 1932