Extracting concepts from file names; a new file clustering criterion
- 27 November 2002
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- No. 02705257,p. 84-93
- https://doi.org/10.1109/icse.1998.671105
Abstract
Decomposing complex software systems into conceptually independent subsystems is a significant software engineering activity which received considerable research attention. Most of the research in this domain considers the body of the source code; trying to cluster together files which are conceptually related. We discuss techniques for extracting concepts (abbreviations) from a more informal source of information: file names. The task is difficult because nothing indicates where to split the file names into substrings. In general, finding abbreviations would require domain knowledge to identify the concepts that are referred to in a name and intuition to recognize such concepts in abbreviated forms. We show by experiment that the techniques we propose allow about 90% of the abbreviations to be found automatically.Keywords
This publication has 7 references indexed in Scilit:
- Towards a framework for program understandingPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- The Orphan Adoption problem in architecture maintenancePublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Reverse architecting approach for complex systemsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- A little knowledge can go a long way towards program understandingPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Building and maintaining analysis-level class hierarchies using Galois LatticesACM SIGPLAN Notices, 1993
- A reverse‐engineering approach to subsystem structure identificationJournal of Software Maintenance: Research and Practice, 1993
- Restructuring Lattice Theory: An Approach Based on Hierarchies of ConceptsPublished by Springer Nature ,1982