Computational Problems in Noisy SNP and Haplotype Analysis: Block Scores, Block Identification, and Population Stratification
- 1 November 2004
- journal article
- Published by Institute for Operations Research and the Management Sciences (INFORMS) in INFORMS Journal on Computing
- Vol. 16 (4) , 360-370
- https://doi.org/10.1287/ijoc.1040.0088
Abstract
The study of haplotypes and their diversity in a population is central to disease-association research. We study several problems arising in haplotype block partitioning. Our objective function is the total number of distinct haplotypes in blocks. We show that the problem is NP-hard when there are errors or missing data, and provide approximation algorithms for several of its variants. We also give an algorithm that solves the problem with high probability under a probabilistic model that allows noise and missing data. In addition, we study the multipopulation case, where one has to partition the haplotypes into populations and seek a different block partition in each one. We provide a heuristic for that problem and use it to analyze simulated and real data. On simulated data, our blocks resemble the true partition more than the blocks generated by the LD-based algorithm of Gabriel et al (2002). On single-population real data, we generate a more concise block description than do extant approaches, with better average LD within blocks. The algorithm also gives promising results on real two-population genotype data.Keywords
This publication has 13 references indexed in Scilit:
- Combinatorial Problems Arising in SNP and Haplotype AnalysisPublished by Springer Nature ,2003
- Haplotypes and informative SNP selection algorithmsPublished by Association for Computing Machinery (ACM) ,2003
- Identifying Blocks and Sub-populations in Noisy SNP DataPublished by Springer Nature ,2003
- The Structure of Haplotype Blocks in the Human GenomeScience, 2002
- Polynomial-time approximation schemes for geometric min-sum median clusteringJournal of the ACM, 2002
- Blocks of Limited Haplotype Diversity Revealed by High-Resolution Scanning of Human Chromosome 21Science, 2001
- High-resolution haplotype structure in the human genomeNature Genetics, 2001
- Inference of Haplotypes from Samples of Diploid Populations: Complexity and AlgorithmsJournal of Computational Biology, 2001
- Variation is the spice of lifeNature Genetics, 2001
- The Probabilistic MethodPublished by Wiley ,2000