PepBank - a database of peptides based on sequence text mining and public peptide data sources

Open Access

1 August 2007

journal article
database
Published by Springer Nature in BMC Bioinformatics

Vol. 8 (1) , 280
https://doi.org/10.1186/1471-2105-8-280

Abstract

Peptides are important molecules with diverse biological functions and biomedical uses. To date, there does not exist a single, searchable archive for peptide sequences or associated biological data. Rather, peptide sequences still have to be mined from abstracts and full-length articles, and/or obtained from the fragmented public sources. We have constructed a new database (PepBank), which at the time of writing contains a total of 19,792 individual peptide entries. The database has a web-based user interface with a simple, Google-like search function, advanced text search, and BLAST and Smith-Waterman search capabilities. The major source of peptide sequence data comes from text mining of MEDLINE abstracts. Another component of the database is the peptide sequence data from public sources (ASPD and UniProt). An additional, smaller part of the database is manually curated from sets of full text articles and text mining results. We show the utility of the database in different examples of affinity ligand discovery. We have created and maintain a database of peptide sequences. The database has biological and medical applications, for example, to predict the binding partners of biologically interesting peptides, to develop peptide based therapeutic or diagnostic agents, or to predict molecular targets or binding specificities of peptides resulting from phage display selection. The database is freely available on http://pepbank.mgh.harvard.edu/ , and the text mining source code (Peptide::Pubmed) is freely available above as well as on CPAN ( http://www.cpan.org/ ).

Keywords

This publication has 64 references indexed in Scilit:

Database resources of the National Center for Biotechnology Information
Nucleic Acids Research, 2006
Entrez Gene: gene-centered information at NCBI
Nucleic Acids Research, 2006
In Vivo Imaging of Molecularly Targeted Phage
Neoplasia, 2006
IntAct--open source resource for molecular interaction data
Nucleic Acids Research, 2006
MINT: the Molecular INTeraction database
Nucleic Acids Research, 2006
Defensins knowledgebase: a manually curated database and information source focused on the defensins family of antimicrobial peptides
Nucleic Acids Research, 2006
MIPS: analysis and annotation of proteins from whole genomes in 2005
Nucleic Acids Research, 2006
The International Protein Index: An integrated database for proteomics experiments
Proteomics, 2004
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
Nucleic Acids Research, 1997
SH2 domains recognize specific phosphopeptide sequences
Published by Elsevier ,1993