Centralizing the non‐central chi‐square: a new method to correct for population stratification in genetic case‐control association studies

Abstract
We present a new method, the δ‐centralization (DC) method, to correct for population stratification (PS) in case‐control association studies. DC works well even when there is a lot of confounding due to PS. The latter causes overdispersion in the usual chi‐square statistics which then have non‐central chi‐square distributions. Other methods approach the non‐centrality indirectly, but we deal with it directly, by estimating the non‐centrality parameter τ itself. Specifically: (1) We define a quantity δ, a function of the relevant subpopulation parameters. We show that, for relatively large samples, δ exactly predicts the elevation of the false positive rate due to PS, when there is no true association between marker genotype and disease. (This quantity δ is quite different from Wright's FST and can be large even when FST is small.) (2) We show how to estimate δ, using a panel of unlinked “neutral” loci. (3) We then show that δ2 corresponds to τ the non‐centrality parameter of the chi‐square distribution. Thus, we can centralize the chi‐square using our estimate of δ; this is the DC method. (4) We demonstrate, via computer simulations, that DC works well with as few as 25–30 unlinked markers, where the markers are chosen to have allele frequencies reasonably close (within ±.1) to those at the test locus. (5) We compare DC with genomic control and show that where as the latter becomes overconservative when there is considerable confounding due to PS (i.e. when δ is large), DC performs well for all values of δ. Genet. Epidemiol. 2006.