An approximate sampling formula under genetic hitchhiking

Abstract
For a genetic locus carrying a strongly beneficial allele which has just fixed in a large population, we study the ancestry at a linked neutral locus. During this “selective sweep” the linkage between the two loci is broken up by recombination and the ancestry at the neutral locus is modeled by a structured coalescent in a random background. For large selection coefficients α and under an appropriate scaling of the recombination rate, we derive a sampling formula with an order of accuracy of in probability. In particular we see that, with this order of accuracy, in a sample of fixed size there are at most two nonsingleton families of individuals which are identical by descent at the neutral locus from the beginning of the sweep. This refines a formula going back to the work of Maynard Smith and Haigh, and complements recent work of Schweinsberg and Durrett on selective sweeps in the Moran model.