Population genetic inference using a fixed number of segregating sites: a reassessment

Abstract
Coalescent theory is commonly used to perform population genetic inference at the nucleotide level. Here, we examine the procedure that fixes the number of segregating sites (henceforth theFSprocedure). In this approach a fixed number of segregating sites (S) are placed on a coalescent tree (independently of the total and internode lengths of the tree). Thus, although widely used, theFSprocedure does not strictly follow the assumptions of coalescent theory and must be considered an approximation of (i) the standard procedure that uses a fixed population mutation parameter θ, and (ii) procedures that condition on the number of segregating sites. We study the differences in the false positive rate for nine statistics by comparing theFSprocedure with the procedures (i) and (ii), using several evolutionary models with single-locus and multilocus data. Our results indicate that for single-locus data theFSprocedure is accurate for the equilibrium neutral model, but problems arise under the alternative models studied; furthermore, for multilocus data, theFSprocedure becomes inaccurate even for the standard neutral model. Therefore, we recommend a procedure that fixes the θ value (or alternatively, procedures that condition onSand take into account the uncertainty of θ) for analysing evolutionary models with multilocus data. With single-locus data, theFSprocedure should not be employed for models other than the standard neutral model.