SIMAP: the similarity matrix of proteins

Open Access

1 January 2006

journal article
research article
Published by Oxford University Press (OUP) in Nucleic Acids Research

Vol. 34 (90001) , D252-D256
https://doi.org/10.1093/nar/gkj106

Abstract

Similarity Matrix of Proteins (SIMAP) (http://mips.gsf.de/simap) provides a database based on a pre-computed similarity matrix covering the similarity space formed by >4 million amino acid sequences from public databases and completely sequenced genomes. The database is capable of handling very large datasets and is updated incrementally. For sequence similarity searches and pairwise alignments, we implemented a grid-enabled software system, which is based on FASTA heuristics and the Smith–Waterman algorithm. Our ProtInfo system allows querying by protein sequences covered by the SIMAP dataset as well as by fragments of these sequences, highly similar sequences and title words. Each sequence in the database is supplemented with pre-calculated features generated by detailed sequence analyses. By providing WWW interfaces as well as web-services, we offer the SIMAP resource as an efficient and comprehensive tool for sequence similarity searches.

Keywords

This publication has 28 references indexed in Scilit:

OrthoMCL: Identification of Ortholog Groups for Eukaryotic Genomes
Genome Research, 2003
The Protein Data Bank and structural genomics
Nucleic Acids Research, 2003
ProtoNet: hierarchical classification of the protein space
Nucleic Acids Research, 2003
Connected gene neighborhoods in prokaryotic genomes
Nucleic Acids Research, 2002
SYSTERS, GeneNest, SpliceNest: exploring sequence space from genome to protein
Nucleic Acids Research, 2002
Predicting transmembrane protein topology with a hidden markov model: application to complete genomes11Edited by F. Cohen
Journal of Molecular Biology, 2001
CluSTr: a database of clusters of SWISS-PROT+TrEMBL proteins
Nucleic Acids Research, 2001
Six-fold speed-up of Smith–Waterman sequence database searches using parallel processing on common microprocessors
Bioinformatics, 2000
Predicting Subcellular Localization of Proteins Based on their N-terminal Amino Acid Sequence
Journal of Molecular Biology, 2000
Flexible Sequence Similarity Searching with the FASTA3 Program Package
Published by Springer Nature ,1999