Shared Information and Program Plagiarism Detection
Top Cited Papers
- 21 June 2004
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Information Theory
- Vol. 50 (7) , 1545-1551
- https://doi.org/10.1109/tit.2004.830793
Abstract
A fundamental question in information theory and in computer science is how to measure similarity or the amount of shared information between two sequences. We have proposed a metric, based on Kolmogorov complexity, to answer this question and have proven it to be universal. We apply this metric in measuring the amount of shared information between two computer programs, to enable plagiarism detection. We have designed and implemented a practical system SID (Software Integrity Diagnosis system) that approximates this metric by a heuristic compression algorithm. Experimental results demonstrate that SID has clear advantages over other plagiarism detection systems. SID system server is online at http://software.bioinformatics.uwaterloo.ca/SID/.Keywords
This publication has 18 references indexed in Scilit:
- The Similarity MetricIEEE Transactions on Information Theory, 2004
- Distance based indexing for string proximity searchPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2004
- WinnowingPublished by Association for Computing Machinery (ACM) ,2003
- Chain Letters and Evolutionary HistoriesScientific American, 2003
- Information distanceIEEE Transactions on Information Theory, 1998
- A suboptimal lossy data compression based on approximate pattern matchingIEEE Transactions on Information Theory, 1997
- Reversibility and adiabatic computation: trading time and space for energyProceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, 1996
- Computer algorithms for plagiarism detectionIEEE Transactions on Education, 1989
- An algorithmic approach to the detection and prevention of plagiarismACM SIGCSE Bulletin, 1976
- A Mathematical Theory of CommunicationBell System Technical Journal, 1948