PseqIP: A nonredundant and exhaustive protein sequence data bank generated from 4 major existing collections
- 1 January 1986
- journal article
- research article
- Published by Wiley in Proteins-Structure Function and Bioinformatics
- Vol. 1 (1) , 60-65
- https://doi.org/10.1002/prot.340010110
Abstract
Four major protein sequence data collections (NBRF-PIR, PSD-Kyoto, PGtrans, and NEWAT) have been merged into a single nonredundant data bank called PseqIP. The data bank entries were automatically matched by a heuristic computer program relying on the fast computation of the number of tetrapeptides shared by two sequences. PseqIP 1.0 includes 6,068 different protein sequences for a total of 1,357,067 residues, representing most of the available sequence information to date. During the course of this work, we found about 600 occurrences course of a protein sequence recorded with a one-amino-acid variation in at least two different data banks. A flat file (ASCII computer-readable format) version of PseqIP 1.0, well-suited for exhaustive homology searches and statistical sequence analysis, is available from our laboratory.Keywords
This publication has 6 references indexed in Scilit:
- Heuristic informational analysis of sequencesNucleic Acids Research, 1986
- Computer generation and statistical analysis of a data bank of protein sequences translated from GenBankBiochimie, 1985
- Rapid and Sensitive Protein Similarity SearchesScience, 1985
- A common philosophy and FORTRAN 77 software package for implementing and searching sequence databasesNucleic Acids Research, 1984
- Rapid similarity searches of nucleic acid and protein data banks.Proceedings of the National Academy of Sciences, 1983
- Efficient algorithms for folding and comparing nucleic acid sequencesNucleic Acids Research, 1982