ProClass protein family database

Open Access

1 January 2000

journal article
research article
Published by Oxford University Press (OUP) in Nucleic Acids Research

Vol. 28 (1) , 273-276
https://doi.org/10.1093/nar/28.1.273

Abstract

ProClass is a protein family database that organizes non-redundant sequence entries into families defined collectively by PIR superfamilies and PROSITE patterns. By combining global similarities and functional motifs into a single classification scheme, ProClass helps to reveal domain and family relationships and classify multi-domain proteins. The database currently consists of >155 000 sequence entries retrieved from both PIR-International and SWISS-PROT databases. Approximately 92 000 or 60% of the ProClass entries are classified into ~6000 families, including a large number of new members detected by our GeneFIND family identification system. The ProClass motif collection contains ~72 000 motif sequences and >1300 multiple alignments for all PROSITE patterns, including >21 000 matches not listed in PROSITE and mostly detected from unique PIR sequences. To maximize family information retrieval, the database provides links to various protein family, domain, alignment and structural class databases. With its high classification rate and comprehensive family relationships, ProClass can be used to support full-scale genomic annotation. The database, now being implemented in an object-relational database management system, is available for online sequence search and record retrieval from our WWW server at http://pir.georgetown.edu/gfserver/ proclass.html

Keywords

This publication has 17 references indexed in Scilit:

Increased coverage of protein families with the Blocks Database servers
Nucleic Acids Research, 2000
PRINTS prepares for the new millennium
Nucleic Acids Research, 1999
SCOP: a Structural Classification of Proteins database
Nucleic Acids Research, 1999
The HSSP database of protein structure-sequence alignments and family profiles
Nucleic Acids Research, 1998
MIPS: a database for protein sequences and complete genomes
Nucleic Acids Research, 1998
GeneFIND web server for protein family identification and information retrieval.
Bioinformatics, 1998
A Protein Class Database Organized with ProSite Protein Groups and PIR Superfamilies
Journal of Computational Biology, 1996
Superfamily classification in PIR-international protein sequence database
Published by Elsevier ,1996
CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice
Nucleic Acids Research, 1994
Improved tools for biological sequence comparison.
Proceedings of the National Academy of Sciences, 1988