Plagiarism Detection in arXiv
- 1 December 2006
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- No. 15504786,p. 1070-1075
- https://doi.org/10.1109/icdm.2006.126
Abstract
We describe a large-scale application of methods for finding plagiarism in research document collections. The methods are applied to a collection of 284,834 documents collected by arXiv.org over a 14 year period, covering a few different research disciplines. The methodology efficiently detects a variety of problematic author behaviors, and heuristics are developed to reduce the number of false positives. The methods are also efficient enough to implement as a real-time submission screen for a collection many times larger.Keywords
All Related Versions
This publication has 7 references indexed in Scilit:
- Plagiarism Detection in arXivPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2006
- Six Degrees of Reputation: The Use and Abuse of Online Review and Recommendation SystemsSSRN Electronic Journal, 2005
- Authorship verification as a one-class classification problemPublished by Association for Computing Machinery (ACM) ,2004
- WinnowingPublished by Association for Computing Machinery (ACM) ,2003
- CHECKPublished by Association for Computing Machinery (ACM) ,1997
- Copy detection mechanisms for digital documentsPublished by Association for Computing Machinery (ACM) ,1995
- First Steps Towards Electronic Research CommunicationComputers in Physics, 1994