Computing the Minimum Recombinant Haplotype Configuration from Incomplete Genotype Data on a Pedigree by Integer Linear Programming
- 1 July 2005
- journal article
- research article
- Published by Mary Ann Liebert Inc in Journal of Computational Biology
- Vol. 12 (6) , 719-739
- https://doi.org/10.1089/cmb.2005.12.719
Abstract
We study the problem of reconstructing haplotype configurations from genotypes on pedigree data with missing alleles under the Mendelian law of inheritance and the minimumrecombination principle, which is important for the construction of haplotype maps and genetic linkage/association analyses. Our previous results show that the problem of finding a minimum-recombinant haplotype configuration (MRHC) is in general NP-hard. This paper presents an effective integer linear programming (ILP) formulation of the MRHC problem with missing data and a branch-and-bound strategy that utilizes a partial order relationship and some other special relationships among variables to decide the branching order. Nontrivial lower and upper bounds on the optimal number of recombinants are introduced at each branching node to effectively prune the search tree. When multiple solutions exist, a best haplotype configuration is selected based on a maximum likelihood approach. The paper also shows for the first time how to incorporate marker interval distance into a rule-based haplotyping algorithm. Our results on simulated data show that the algorithm could recover haplotypes with 50 loci from a pedigree of size 29 in seconds on a Pentium IV computer. Its accuracy is more than 99.8% for data with no missing alleles and 98.3% for data with 20% missing alleles in terms of correctly recovered phase information at each marker locus. A comparison with a statistical approach SimWalk2 on simulated data shows that the ILP algorithm runs much faster than SimWalk2 and reports better or comparable haplotypes on average than the first and second runs of SimWalk2. As an application of the algorithm to real data, we present some test results on reconstructing haplotypes from a genome-scale SNP dataset consisting of 12 pedigrees that have 0.8% to 14.5% missing alleles.Keywords
This publication has 22 references indexed in Scilit:
- A Survey of Computational Methods for Determining HaplotypesPublished by Springer Nature ,2004
- An Overview of Combinatorial Methods for Haplotype InferencePublished by Springer Nature ,2004
- The complexity of checking consistency of pedigree information and related problemsJournal of Computer Science and Technology, 2004
- The Haplotyping problem: An overview of computational models and solutionsJournal of Computer Science and Technology, 2003
- The Structure of Haplotype Blocks in the Human GenomeScience, 2002
- Merlin—rapid analysis of dense genetic maps using sparse gene flow treesNature Genetics, 2001
- High-resolution haplotype structure in the human genomeNature Genetics, 2001
- Map of the Human Genome 3.0Science, 2001
- Allegro, a new computer program for multipoint linkage analysisNature Genetics, 2000
- A program to draw pedigrees using LINKAGE or LINKSYS data filesAnnals of Human Genetics, 1990