The Limits of Individual Identification from Sample Allele Frequencies: Theory and Statistical Analysis

Open Access

2 October 2009

journal article
research article
Published by Public Library of Science (PLoS) in PLoS Genetics

Vol. 5 (10) , e1000628
https://doi.org/10.1371/journal.pgen.1000628

Abstract

It was shown recently using experimental data that it is possible under certain conditions to determine whether a person with known genotypes at a number of markers was part of a sample from which only allele frequencies are known. Using population genetic and statistical theory, we show that the power of such identification is, approximately, proportional to the number of independent SNPs divided by the size of the sample from which the allele frequencies are available. We quantify the limits of identification and propose likelihood and regression analysis methods for the analysis of data. We show that these methods have similar statistical properties and have more desirable properties, in terms of type-I error rate and statistical power, than test statistics suggested in the literature. It was shown recently by Homer and colleagues that it may be possible to determine whether a person with known genotypes at a number of markers was part of a pool of DNA from which only frequencies of alleles at the markers are known. In this study, we quantify how well such identification can work in practice. The larger the size of the sample from which the allele frequencies are available, the more independent genetic markers are required to allow individual identification.

Keywords

This publication has 3 references indexed in Scilit:

Increased accuracy of artificial selection by using the realized relationship matrix
Genetics Research, 2009
Resolving Individuals Contributing Trace Amounts of DNA to Highly Complex Mixtures Using High-Density SNP Genotyping Microarrays
PLoS Genetics, 2008
Linkage disequilibrium in finite populations
Theoretical and Applied Genetics, 1968