Accuracy, efficiency and robustness of four algorithms allowing full sibship reconstruction from DNA marker data

Abstract
In the problem of reconstructing full sib pedigrees from DNA marker data, three existing algorithms and one new algorithm are compared in terms of accuracy, efficiency and robustness using real and simulated data sets. An algorithm based on the exclusion principle and another based on a maximization of the Simpson index were very accurate at reconstructing data sets comprising a few large families but had problems with data sets with limited family structure, while a Markov Chain Monte Carlo (MCMC) algorithm based on the maximization of a partition score had the opposite behaviour. An MCMC algorithm based on maximizing the full joint likelihood performed best in small data sets comprising several medium‐sized families but did not work well under most other conditions. It appears that the likelihood surface may be rough and presents challenges for the MCMC algorithm to find the global maximum. This likelihood algorithm also exhibited problems in reconstructing large family groups, due possibly to limits in computational precision. The accuracy of each algorithm improved with an increasing amount of information in the data set, and was very high with eight loci with eight alleles each. All four algorithms were quite robust to deviation from an idealized uniform allelic distribution, to departures from idealized Mendelian inheritance in simulated data sets and to the presence of null alleles. In contrast, none of the algorithms were very robust to the probable presence of error/mutation in the data. Depending upon the type of mutation or errors and the algorithm used, between 70 and 98% of the affected individuals were classified improperly on average.