PANTHER: a browsable database of gene products organized by biological function, using curated protein family and subfamily classification

Top Cited Papers

1 January 2003

journal article
Published by Oxford University Press (OUP) in Nucleic Acids Research

Vol. 31 (1) , 334-341
https://doi.org/10.1093/nar/gkg115

Abstract

The PANTHER database was designed for high-throughput analysis of protein sequences. One of the key features is a simplified ontology of protein function, which allows browsing of the database by biological functions. Biologist curators have associated the ontology terms with groups of protein sequences rather than individual sequences. Statistical models (Hidden Markov Models, or HMMs) are built from each of these groups. The advantage of this approach is that new sequences can be automatically classified as they become available. To ensure accurate functional classification, HMMs are constructed not only for families, but also for functionally distinct subfamilies. Multiple sequence alignments and phylogenetic trees, including curator-assigned information, are available for each family. The current version of the PANTHER database includes training sequences from all organisms in the GenBank non-redundant protein database, and the HMMs have been used to classify gene products across the entire genomes of human, and Drosophila melanogaster. The ontology terms and protein families and subfamilies, as well as Drosophila gene c;assifications, can be browsed and searched for free. Due to outstanding contractual obligations, access to human gene classifications and to protein family trees and multiple sequence alignments will temporarily require a nominal registration fee. PANTHER is publicly available on the web at http://panther.celera.com.

Keywords

This publication has 10 references indexed in Scilit:

PANTHER: A Library of Protein Families and Subfamilies Indexed by Function
Genome Research, 2003
Assessment of Genome-Wide Protein Function Classification for Drosophila melanogaster
Genome Research, 2003
The Celera Discovery SystemTM
Nucleic Acids Research, 2002
The Sequence of the Human Genome
Science, 2001
Gene Ontology: tool for the unification of biology
Nature Genetics, 2000
Introducing RefSeq and LocusLink: curated human genome resources at the NCBI
Trends in Genetics, 2000
The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000
Nucleic Acids Research, 2000
SMART, a simple modular architecture research tool: Identification of signaling domains
Proceedings of the National Academy of Sciences, 1998
Pfam: A comprehensive database of protein domain families based on seed alignments
Proteins-Structure Function and Bioinformatics, 1997
Basic local alignment search tool
Journal of Molecular Biology, 1990