ESTIMATING POPULATION DIVERSITY WITH UNRELIABLE LOW FREQUENCY COUNTS

Open Access

1 December 2011

proceedings article
Published by World Scientific Pub Co Pte Ltd in Biocomputing 2020

p. 203-212
https://doi.org/10.1142/9789814366496_0020

Abstract

We consider the classical population diversity estimation scenario based on frequency count data (the number of classes or taxa represented once, twice, etc. in the sample), but with the proviso that the lowest frequency counts, especially the singletons, may not be reliably observed. This arises especially in data derived from modern high-throughput DNA sequencing, where errors may cause sequences to be incorrectly assigned to new taxa instead of being matched to existing, observed taxa. We look at a spectrum of methods for addressing this issue, focusing in particular on fitting a parametric mixture model and deleting the highest-diversity component; we also consider regarding the data as left-censored and effectively pooling two or more low frequency counts. We find that these purely statistical “downstream” corrections will depend strongly on their underlying assumptions, but that such methods can be useful nonetheless.

Keywords

This publication has 0 references indexed in Scilit: