Intra- and interpopulation genotype reconstruction from tagging SNPs

6 December 2006

journal article
Published by Cold Spring Harbor Laboratory in Genome Research

Vol. 17 (1) , 96-107
https://doi.org/10.1101/gr.5741407

Abstract

The optimal method to be used for tSNP selection, the applicability of a reference LD map to unassayed populations, and the scalability of these methods to genome-wide analysis, all remain subjects of debate. We propose novel, scalable matrix algorithms that address these issues and we evaluate them on genotypic data from 38 populations and four genomic regions (248 SNPs typed for ∼2000 individuals). We also evaluate these algorithms on a second data set consisting of genotypes available from the HapMap database (1336 SNPs for four populations) over the same genomic regions. Furthermore, we test these methods in the setting of a real association study using a publicly available family data set. The algorithms we use for tSNP selection and unassayed SNP reconstruction do not require haplotype inference and they are, in principle, scalable even to genome-wide analysis. Moreover, they are greedy variants of recently developed matrix algorithms with provable performance guarantees. Using a small set of carefully selected tSNPs, we achieve very good reconstruction accuracy of “untyped” genotypes for most of the populations studied. Additionally, we demonstrate in a quantitative manner that the chosen tSNPs exhibit substantial transferability, both within and across different geographic regions. Finally, we show that reconstruction can be applied to retrieve significant SNP associations with disease, with important genotyping savings.

Keywords

This publication has 57 references indexed in Scilit:

An Evaluation of the Performance of Tag SNPs Derived from HapMap in a Caucasian Population
PLoS Genetics, 2006
A haplotype map of the human genome
Nature, 2005
Efficiency and power in genetic association studies
Nature Genetics, 2005
Haploview: analysis and visualization of LD and haplotype maps
Bioinformatics, 2004
Optimal Haplotype Block-Free Selection of Tagging SNPs for Genome-Wide Association Studies
Genome Research, 2004
The International HapMap Project
Nature, 2003
Principal component analysis for selection of optimal SNP‐sets that capture intragenic genetic variation
Genetic Epidemiology, 2003
Selection and Evaluation of Tagging SNPs in the Neuronal-Sodium-Channel Gene SCN1A: Implications for Linkage-Disequilibrium Gene Mapping
American Journal of Human Genetics, 2003
Selection of Genetic Markers for Association Analyses, Using Linkage Disequilibrium and Haplotypes
American Journal of Human Genetics, 2003
Population genetics—making sense out of sequence
Nature Genetics, 1999