CCFinder: a multilinguistic token-based code clone detection system for large scale source code

Top Cited Papers

7 August 2002

journal article
Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Software Engineering

Vol. 28 (7) , 654-670
https://doi.org/10.1109/tse.2002.1019480

Abstract

A code clone is a code portion in source files that is identical or similar to another. Since code clones are believed to reduce the maintainability of software, several code clone detection techniques and tools have been proposed. This paper proposes a new clone detection technique, which consists of the transformation of input source text and a token-by-token comparison. For its implementation with several useful optimization techniques, we have developed a tool, named CCFinder (Code Clone Finder), which extracts code clones in C, C++, Java, COBOL and other source files. In addition, metrics for the code clones have been developed. In order to evaluate the usefulness of CCFinder and metrics, we conducted several case studies where we applied the new tool to the source code of JDK, FreeBSD, NetBSD, Linux, and many other systems. As a result, CCFinder has effectively found clones and the metrics have been able to effectively identify the characteristics of the systems. In addition, we have compared the proposed technique with other clone detection techniques.

Keywords

This publication has 12 references indexed in Scilit:

Software quality analysis by code clones in industrial legacy software
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2003
Measuring clone based reengineering opportunities
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2003
Clone detection using abstract syntax trees
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
Assessing the benefits of incorporating function clone detection in a development process
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
On finding duplication and near-duplication in large software systems
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
Using design abstractions to visualize, quantify, and restructure software
Journal of Systems and Software, 1998
Introduction to Suffix Trees
Published by Cambridge University Press (CUP) ,1997
Pattern matching for clone and concept detection
Automated Software Engineering, 1996
Experiment on the automatic detection of function clones in a software system using metrics
Published by Institute of Electrical and Electronics Engineers (IEEE) ,1996
Substring matching for clone detection and change tracking
Published by Institute of Electrical and Electronics Engineers (IEEE) ,1994