MagicMatch--cross-referencing sequence identifiers across databases

Open Access

16 June 2005

journal article
research article
Published by Oxford University Press (OUP) in Bioinformatics

Vol. 21 (16) , 3429-3430
https://doi.org/10.1093/bioinformatics/bti548

Abstract

Motivation: At present, mapping of sequence identifiers across databases is a daunting, time-consuming and computationally expensive process, usually achieved by sequence similarity searches with strict threshold values. Summary: We present a rapid and efficient method to map sequence identifiers across databases. The method uses the MD5 checksum algorithm for message integrity to generate sequence fingerprints and uses these fingerprints as hash strings to map sequences across databases. The program, called MagicMatch, is able to cross-link any of the major sequence databases within a few seconds on a modest desktop computer. Availability: MagicMatch is available at the following URL (http://cgg.ebi.ac.uk/services/magicmatch/), including an interactive service for major databases and binary downloads for widely used platforms. Contact:ouzounis@ebi.ac.uk

Keywords

This publication has 8 references indexed in Scilit:

A highly sensitive selection method for directed evolution of homing endonucleases
Nucleic Acids Research, 2005
Database resources of the National Center for Biotechnology Information
Nucleic Acids Research, 2004
COmplete GENome Tracking (COGENT): a flexible data environment for computational genomics
Bioinformatics, 2003
Clustering of highly homologous sequences to reduce the size of large protein databases
Bioinformatics, 2001
Removing near-neighbour redundancy from large protein sequence collections.
Bioinformatics, 1998
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
Nucleic Acids Research, 1997
SRS—an indexing and retrieval tool for flat file data libraries
Bioinformatics, 1993
[5] Rapid and sensitive sequence comparison with FASTP and FASTA
Published by Elsevier ,1990