Estimation of Web video multiplicity

Abstract
With ever more popularity of video web-publishing, many popular contents are being mirrored, reformatted, modified and republished, resulting in excessive content duplication. While such redundancy provides fault tolerance for continuous availability of information, it could potentially create problems for multimedia search engines in that the search results for a given query might become repetitious, and cluttered with a large number of duplicates. As such, developing techniques for detecting similarity and duplication is important to multimedia search engines. In addition, content providers might be interested in identifying duplicates of their content for legal, contractual or other business related reasons. In this paper, we propose an efficient algorithm called video signature to detect similar video sequences for large databases such as the web. The idea is to first form a 'signature' for each video sequence by selection a small number of its frames that are most similar to a number of randomly chosen seed images. Then the similarity between any tow video sequences can be reliably estimated by comparing their respective signatures. Using this method, we achieve 85 percent recall and precision ratios on a test database of 377 video sequences. As a proof of concept, we have applied our proposed algorithm to a collection of 1800 hours of video corresponding to around 45000 clips from the web. Our results indicate that, on average, every video in our collection from the web has around five similar copies.© (1999) COPYRIGHT SPIE--The International Society for Optical Engineering. Downloading of the abstract is permitted for personal use only.

This publication has 0 references indexed in Scilit: