Parameterized Duplication in Strings: Algorithms and an Application to Software Maintenance
- 1 October 1997
- journal article
- research article
- Published by Society for Industrial & Applied Mathematics (SIAM) in SIAM Journal on Computing
- Vol. 26 (5) , 1343-1362
- https://doi.org/10.1137/s0097539793246707
Abstract
As an aid in software maintenance, it would be useful to be able to track down duplication in large software systems efficiently. Duplication in code is often in the form of sections of code that are the same except for a systematic change of parameters such as identifiers and constants. To model such parameterized duplication in code, this paper introduces the notions of parameterized strings and parameterized matches of parameterized strings. A data structure called a parameterized suffix tree is defined to aid in searching for parameterized matches. For fixed alphabets, algorithms are given to construct a parameterized suffix tree in linear time and to find all maximal parameterized matches over a threshold length in a parameterized p-string in time linear in the size of the input plus the number of matches reported. The algorithms have been implemented, and experimental results show that they perform well on C code.Keywords
This publication has 10 references indexed in Scilit:
- On-line construction of suffix treesAlgorithmica, 1995
- Sublinear approximate string matching and biological applicationsAlgorithmica, 1994
- Dotplot: A Program for Exploring Self-Similarity in Millions of Lines of Text and CodeJournal of Computational and Graphical Statistics, 1993
- Fast text searchingCommunications of the ACM, 1992
- Algorithms for Finding Patterns in StringsPublished by Elsevier ,1990
- Data structures and algorithms for approximate string matchingJournal of Complexity, 1988
- Detecting Plagiarism in Student Pascale ProgramsThe Computer Journal, 1988
- A Space-Economical Suffix Tree Construction AlgorithmJournal of the ACM, 1976
- Linear pattern matching algorithmsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,1973