PseqIP: A nonredundant and exhaustive protein sequence data bank generated from 4 major existing collections

1 January 1986

journal article
research article
Published by Wiley in Proteins-Structure Function and Bioinformatics

Vol. 1 (1) , 60-65
https://doi.org/10.1002/prot.340010110

Abstract

Four major protein sequence data collections (NBRF-PIR, PSD-Kyoto, PGtrans, and NEWAT) have been merged into a single nonredundant data bank called PseqIP. The data bank entries were automatically matched by a heuristic computer program relying on the fast computation of the number of tetrapeptides shared by two sequences. PseqIP 1.0 includes 6,068 different protein sequences for a total of 1,357,067 residues, representing most of the available sequence information to date. During the course of this work, we found about 600 occurrences course of a protein sequence recorded with a one-amino-acid variation in at least two different data banks. A flat file (ASCII computer-readable format) version of PseqIP 1.0, well-suited for exhaustive homology searches and statistical sequence analysis, is available from our laboratory.

Keywords

This publication has 6 references indexed in Scilit:

Heuristic informational analysis of sequences
Nucleic Acids Research, 1986
Computer generation and statistical analysis of a data bank of protein sequences translated from GenBank
Biochimie, 1985
Rapid and Sensitive Protein Similarity Searches
Science, 1985
A common philosophy and FORTRAN 77 software package for implementing and searching sequence databases
Nucleic Acids Research, 1984
Rapid similarity searches of nucleic acid and protein data banks.
Proceedings of the National Academy of Sciences, 1983
Efficient algorithms for folding and comparing nucleic acid sequences
Nucleic Acids Research, 1982