CCFinder: a multilinguistic token-based code clone detection system for large scale source code
Top Cited Papers
- 7 August 2002
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Software Engineering
- Vol. 28 (7) , 654-670
- https://doi.org/10.1109/tse.2002.1019480
Abstract
A code clone is a code portion in source files that is identical or similar to another. Since code clones are believed to reduce the maintainability of software, several code clone detection techniques and tools have been proposed. This paper proposes a new clone detection technique, which consists of the transformation of input source text and a token-by-token comparison. For its implementation with several useful optimization techniques, we have developed a tool, named CCFinder (Code Clone Finder), which extracts code clones in C, C++, Java, COBOL and other source files. In addition, metrics for the code clones have been developed. In order to evaluate the usefulness of CCFinder and metrics, we conducted several case studies where we applied the new tool to the source code of JDK, FreeBSD, NetBSD, Linux, and many other systems. As a result, CCFinder has effectively found clones and the metrics have been able to effectively identify the characteristics of the systems. In addition, we have compared the proposed technique with other clone detection techniques.Keywords
This publication has 12 references indexed in Scilit:
- Software quality analysis by code clones in industrial legacy softwarePublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- Measuring clone based reengineering opportunitiesPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- Clone detection using abstract syntax treesPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Assessing the benefits of incorporating function clone detection in a development processPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- On finding duplication and near-duplication in large software systemsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Using design abstractions to visualize, quantify, and restructure softwareJournal of Systems and Software, 1998
- Introduction to Suffix TreesPublished by Cambridge University Press (CUP) ,1997
- Pattern matching for clone and concept detectionAutomated Software Engineering, 1996
- Experiment on the automatic detection of function clones in a software system using metricsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,1996
- Substring matching for clone detection and change trackingPublished by Institute of Electrical and Electronics Engineers (IEEE) ,1994