Plagiarism Detection in arXiv

1 December 2006

conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

No. 15504786,p. 1070-1075
https://doi.org/10.1109/icdm.2006.126

Abstract

We describe a large-scale application of methods for finding plagiarism in research document collections. The methods are applied to a collection of 284,834 documents collected by arXiv.org over a 14 year period, covering a few different research disciplines. The methodology efficiently detects a variety of problematic author behaviors, and heuristics are developed to reduce the number of false positives. The methods are also efficient enough to implement as a real-time submission screen for a collection many times larger.

Keywords

All Related Versions

Version 1, 2007-02-01, ArXiv

This publication has 7 references indexed in Scilit:

Plagiarism Detection in arXiv
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2006
Six Degrees of Reputation: The Use and Abuse of Online Review and Recommendation Systems
SSRN Electronic Journal, 2005
Authorship verification as a one-class classification problem
Published by Association for Computing Machinery (ACM) ,2004
Winnowing
Published by Association for Computing Machinery (ACM) ,2003
CHECK
Published by Association for Computing Machinery (ACM) ,1997
Copy detection mechanisms for digital documents
Published by Association for Computing Machinery (ACM) ,1995
First Steps Towards Electronic Research Communication
Computers in Physics, 1994