Shotgun correlations in software measures
- 1 January 1993
- journal article
- Published by Institution of Engineering and Technology (IET) in Software Engineering Journal
- Vol. 8 (1) , 5-13
- https://doi.org/10.1049/sej.1993.0002
Abstract
Many software measures have been forwarded on the simple basis of a high linear correlation coefficient with some measurable quantities. The linear correlation coefficient is an unreliable statistic for deciding whether an observed correlation indicates significant association. Several published software measure experiments collected more than 20 different measurements, or have 14 or fewer observations. With considerable data from small samples, the probability of ‘discovering’ a ‘significant’ correlation is high. We present a computer simulation experiment where the correlation between sets of randomly generated numbers is calculated. We also look at randomly generated numbers in the ranges that would be expected in Halstead's Software Science [1] measures. Our results show that the average maximum linear correlation for randomly generated numbers is 0.70 or higher if the sample size is low compared to the number of variables. Alternative statistical approaches to obtaining meaningful significant results are presented.Keywords
This publication has 1 reference indexed in Scilit:
- Numerical recipes: the art of scientific computingAnalytica Chimica Acta, 1987