Quantifying the amount of missing information in genetic association studies
- 19 September 2006
- journal article
- research article
- Published by Wiley in Genetic Epidemiology
- Vol. 30 (8) , 703-717
- https://doi.org/10.1002/gepi.20181
Abstract
Many genetic analyses are done with incomplete information; for example, unknown phase in haplotype‐based association studies. Measures of the amount of available information can be used for efficient planning of studies and/or analyses. In particular, the linkage disequilibrium (LD) between two sets of markers can be interpreted as the amount of information one set of markers contains for testing allele frequency differences in the second set, and measuring LD can be viewed as quantifying information in a missing data problem. We introduce a framework for measuring the association between two sets of variables; for example, genotype data for two distinct groups of markers, or haplotype and genotype data for a given set of polymorphisms. The goal is to quantify how much information is in one data set, e.g. genotype data for a set of SNPs, for estimating parameters that are functions of frequencies in the second data set, e.g. haplotype frequencies, relative to the ideal case of actually observing the complete data, e.g. haplotypes. In the case of genotype data on two mutually exclusive sets of markers, the measure determines the amount of multi‐locus LD, and is equal to the classical measurer2, if the sets consist each of one bi‐allelic marker. In general, the measures are interpreted as the asymptotic ratio of sample sizes necessary to achieve the same power in case‐control testing. The focus of this paper is on case‐control allele/haplotype tests, but the framework can be extended easily to other settings like regressing quantitative traits on allele/haplotype counts, or tests on genotypes or diplotypes. We highlight applications of the approach, including tools for navigating the HapMap database [The International HapMap Consortium,2003], and genotyping strategies for positional cloning studies.Genet. Epidemiol.2006.Keywords
This publication has 36 references indexed in Scilit:
- Testing Untyped Alleles (TUNA)—applications to genome‐wide association studiesGenetic Epidemiology, 2006
- Coverage and Characteristics of the Affymetrix GeneChip Human Mapping 100K SNP SetPLoS Genetics, 2006
- Biases and Reconciliation in Estimates of Linkage Disequilibrium in the Human GenomeAmerican Journal of Human Genetics, 2006
- A Fine-Scale Linkage-Disequilibrium Measure Based on Length of Haplotype SharingAmerican Journal of Human Genetics, 2006
- A single-nucleotide polymorphism tagging set for human drug metabolism and transportNature Genetics, 2004
- Genotype prediction using a dense map of SNPsGenetic Epidemiology, 2004
- Genotyping over 100,000 SNPs on a pair of oligonucleotide arraysNature Methods, 2004
- Guidelines for Genotyping in Genomewide Linkage Studies: Single-Nucleotide–Polymorphism Maps Versus Microsatellite MapsAmerican Journal of Human Genetics, 2004
- The International HapMap ProjectNature, 2003
- Blased Tests of Association: Comparisons of Allele Frequencies when Departing from Hardy-Weinberg ProportionsAmerican Journal of Epidemiology, 1999