Plagiarism Detection in arXiv

Preprint

1 February 2007

preprint
Published by arXiv in arXiv

https://doi.org/10.48550/arXiv.cs/0702012

Abstract

We describe a large-scale application of methods for finding plagiarism in research document collections. The methods are applied to a collection of 284,834 documents collected by arXiv.org over a 14 year period, covering a few different research disciplines. The methodology efficiently detects a variety of problematic author behaviors, and heuristics are developed to reduce the number of false positives. The methods are also efficient enough to implement as a real-time submission screen for a collection many times larger.

Keywords

PLAGIARISM DETECTION
DOCUMENT COLLECTIONS
FINDING PLAGIARISM
SCALE APPLICATION
TIME SUBMISSION
HEURISTICS
ARXIV

All Related Versions

Version 1, 2007-02-01, ArXiv
Published version: (15504786), 1070.

This publication has 0 references indexed in Scilit: