A comprehensive evaluation of capture-recapture models for estimating software defect content

1 June 2000

journal article
Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Software Engineering

Vol. 26 (6) , 518-540
https://doi.org/10.1109/32.852741

Abstract

An important requirement to control the inspection of software artifacts is to be able to decide, based on more objective information, whether the inspection can stop or whether it should continue to achieve a suitable level of artifact quality. A prediction of the number of remaining defects in an inspected artifact can be used for decision making. Several studies in software engineering have considered capture-recapture models, originally proposed by biologists to estimate animal populations, to make a prediction. However, few studies compare the actual number of remaining defects to the one predicted by a capture-recapture model on real software engineering artifacts. Thus, there is little work looking at the robustness of capture-recapture models under realistic software engineering conditions, where it is expected that some of their assumptions will be violated. Simulations have been performed, but no definite conclusions can be drawn regarding the degree of accuracy of such models under realistic inspection conditions and the factors affecting this accuracy. Furthermore, the existing studies focused on a subset of the existing capture-recapture models. Thus, a more exhaustive comparison is still missing. In this study, we focus on traditional inspections and estimate, based on actual inspections data, the degree of accuracy of relevant, state-of-the-art capture-recapture models as they have been proposed in biology and for which statistical estimators exist. In order to assess their robustness, we look at the impact of the number of inspectors and the number of actual defects on the estimators' accuracy based on actual inspection data. Our results show that models are strongly affected by the number of inspectors and, therefore, one must consider this factor before using capture-recapture models. When the number of inspectors is too small, no model is sufficiently accurate and underestimation may be substantial. In addition, some models perform better than others in a large number of conditions and plausible reasons are discussed. Based on our analyses, we recommend using a model taking into account that defects have different probabilities of being detected and the corresponding Jackknife Estimator. Furthermore, we attempt to calibrate the prediction models based on their relative error, as previously computed on other inspections. Although intuitive and straightforward, we identified theoretical limitations to this approach which were then confirmed by the data.

Keywords

This publication has 27 references indexed in Scilit:

The application of subjective estimates of effectiveness to controlling software inspections
Journal of Systems and Software, 2000
On the statistical analysis of the number of errors remaining in a software design document after inspection
IEEE Transactions on Software Engineering, 1997
The empirical investigation of Perspective-Based Reading
Empirical Software Engineering, 1996
Assessing software designs using capture-recapture methods
IEEE Transactions on Software Engineering, 1993
Lessons from three years of inspection data (software development)
IEEE Software, 1993
Experience with Fagan's inspection method
Software: Practice and Experience, 1992
Estimating software fault content before coding
Published by Institute of Electrical and Electronics Engineers (IEEE) ,1992
A Two-Person Inspection Method to Improve Prog ramming Productivity
IEEE Transactions on Software Engineering, 1989
Software inspections: an effective verification process
IEEE Software, 1989
Design and code inspections to reduce errors in program development
IBM Systems Journal, 1976