Identifying Productivity Drivers by Modeling Work Units Using Partial Data
- 1 May 2001
- journal article
- Published by Taylor & Francis in Technometrics
- Vol. 43 (2) , 168-179
- https://doi.org/10.1198/004017001750386288
Abstract
We describe a new algorithm for estimating a model for an independent variable that is not directly observed but that represents one set of marginal totals of a sparse nonnegative two-way table whose other margin and zero pattern are known. The application that inspired the development of this algorithm arises in software engineering. We seek to identify those factors that affect the effort required for a developer to make a change to the software—for instance, to identify difficult areas of the code, measure changes in the code difficulty through time, and evaluate the effectiveness of development tools. Unfortunately, measurements of effort for changes are not available in historical data. We model change effort using a developer's total monthly effort and information about which changes he/she investigated in each month. We illustrate a few specific applications of our tool, demonstrate that the algorithm is an instance of the EM algorithm, and present a simulation study that speaks well of the reliability of the results our algorithm produces. In short, this algorithm allows analysts to quantify the impact of a promising software engineering tool or practice on coding effort.Keywords
This publication has 11 references indexed in Scilit:
- Software configuration management for the 21st centuryBell Labs Technical Journal, 2002
- Does code decay? Assessing the evidence from change management dataIEEE Transactions on Software Engineering, 2001
- Predicting fault incidence using software change historyIEEE Transactions on Software Engineering, 2000
- Multiple Imputation After 18+ YearsJournal of the American Statistical Association, 1996
- The5ESSSwitching System: IntroductionAT&T Technical Journal, 1985
- Substantiating programmer variabilityProceedings of the IEEE, 1981
- A model of large program developmentIBM Systems Journal, 1976
- The source code control systemIEEE Transactions on Software Engineering, 1975
- Generalized Iterative Scaling for Log-Linear ModelsThe Annals of Mathematical Statistics, 1972
- On a Least Squares Adjustment of a Sampled Frequency Table When the Expected Marginal Totals are KnownThe Annals of Mathematical Statistics, 1940