Logistic regression protects against population structure in genetic association studies

Abstract
We conduct an extensive simulation study to compare the merits of several methods for using null (unlinked) markers to protect against false positives due to cryptic substructure in population-based genetic association studies. The more sophisticated “structured association” methods perform well but are computationally demanding and rely on estimating the correct number of subpopulations. The simple and fast “genomic control” approach can lose power in certain scenarios. We find that procedures based on logistic regression that are flexible, computationally fast, and easy to implement also provide good protection against the effects of cryptic substructure, even though they do not explicitly model the population structure.