PIRSF: family classification system at the Protein Information Resource
- 1 January 2004
- journal article
- Published by Oxford University Press (OUP) in Nucleic Acids Research
- Vol. 32 (90001) , 112D-114
- https://doi.org/10.1093/nar/gkh097
Abstract
The Protein Information Resource (PIR) is an integrated public resource of protein informatics. To facilitate the sensible propagation and standardization of protein annotation and the systematic detection of annotation errors, PIR has extended its superfamily concept and developed the SuperFamily (PIRSF) classification system. Based on the evolutionary relationships of whole proteins, this classification system allows annotation of both specific biological and generic biochemical functions. The system adopts a network structure for protein classification from superfamily to subfamily levels. Protein family members are homologous (sharing common ancestry) and homeomorphic (sharing full-length sequence similarity with common domain architecture). The PIRSF database consists of two data sets, preliminary clusters and curated families. The curated families include family name, protein membership, parent-child relationship, domain architecture, and optional description and bibliography. PIRSF is accessible from the website at http://pir.georgetown.edu/pirsf/ for report retrieval and sequence classification. The report presents family annotation, membership statistics, cross-references to other databases, graphical display of domain architecture, and links to multiple sequence alignments and phylogenetic trees for curated families. PIRSF can be utilized to analyze phylogenetic profiles, to reveal functional convergence and divergence, and to identify interesting relationships between homeomorphic families, domains and structural classes.Keywords
This publication has 12 references indexed in Scilit:
- UniProt: the Universal Protein knowledgebaseNucleic Acids Research, 2004
- Protein family classification and functional annotationComputational Biology and Chemistry, 2003
- iProClass: an integrated database of protein family, function and structure informationNucleic Acids Research, 2003
- The CATH database: an extended protein family resource for structural and functional genomicsNucleic Acids Research, 2003
- The InterPro Database, 2003 brings increased coverage and new featuresNucleic Acids Research, 2003
- The Protein Information ResourceNucleic Acids Research, 2003
- The Pfam Protein Families DatabaseNucleic Acids Research, 2002
- SCOP database in 2002: refinements accommodate structural genomicsNucleic Acids Research, 2002
- Maximum Discrimination Hidden Markov Models of Sequence ConsensusJournal of Computational Biology, 1995
- The origin and evolution of protein superfamilies.1976