ESTIMATING POPULATION DIVERSITY WITH UNRELIABLE LOW FREQUENCY COUNTS
Open Access
- 1 December 2011
- proceedings article
- Published by World Scientific Pub Co Pte Ltd in Biocomputing 2020
Abstract
We consider the classical population diversity estimation scenario based on frequency count data (the number of classes or taxa represented once, twice, etc. in the sample), but with the proviso that the lowest frequency counts, especially the singletons, may not be reliably observed. This arises especially in data derived from modern high-throughput DNA sequencing, where errors may cause sequences to be incorrectly assigned to new taxa instead of being matched to existing, observed taxa. We look at a spectrum of methods for addressing this issue, focusing in particular on fitting a parametric mixture model and deleting the highest-diversity component; we also consider regarding the data as left-censored and effectively pooling two or more low frequency counts. We find that these purely statistical “downstream” corrections will depend strongly on their underlying assumptions, but that such methods can be useful nonetheless.Keywords
This publication has 0 references indexed in Scilit: