Low molecular weight proteins: A challenge for post‐genomic research
- 14 April 1998
- journal article
- research article
- Published by Wiley in Electrophoresis
- Vol. 19 (4) , 536-544
- https://doi.org/10.1002/elps.1150190413
Abstract
The EcoGene project involves the examination of Escherichia coli K‐12 DNA sequences and accompanying annotation in the public databases in order to refine the representation and prediction of the entire set of E. coli K‐12 chromosomally encoded protein sequences. The results of this ongoing effort have been deposited in the SWISSPROT protein sequence database as sequencing of the E. coli genome has progressed to completion in recent years. Through this continuing research, we have discovered that the prediction of low molecular weight (small) proteins, arbitrarily defined as protein sequences ≤ 150 amino acids (aa) in length, is problematic and requires special attention. We describe the small protein subset of EcoGene and the approach used to derive this subset from the complete E. coli genome sequence and database annotations. These E. coli proteins have helped to identify new small genes in other organisms and to identify conserved residues (motifs) using database searches and multiple alignments. Two thirds of the E. coli small proteins have not been characterized experimentally. The careful application of computer and laboratory methods to the analysis of small proteins is needed for accurate prediction, verification and characterization. The problem of accurate protein sequence identification is not limited to small proteins or to E. coli; these problems are encountered to varying degrees throughout all sequence databases.Keywords
This publication has 24 references indexed in Scilit:
- The Complete Genome Sequence of Escherichia coli K-12Science, 1997
- A role for Edman degradation in proteome studiesElectrophoresis, 1997
- Go hunting in sequence databases but watch out for the trapsTrends in Genetics, 1996
- Sequencing and analysis of bacterial genomesCurrent Biology, 1996
- Metabolism and evolution of Haemophilus influenzae deduced from a whole-genome comparison with Escherichia coliCurrent Biology, 1996
- Two‐dimensional gel electrophoresis of Escherichia coli homogenates: The Escherichia coli SWISS‐2DPAGE databaseElectrophoresis, 1996
- Sequence similarity analysis of Escherichia coli proteins: functional and evolutionary implications.Proceedings of the National Academy of Sciences, 1995
- Detection of new genes in a bacterial genome using Markov models for three gene classesNucleic Acids Research, 1995
- Progress with gene‐product mapping of the Mollicutes: Mycoplasma genitaliumElectrophoresis, 1995
- Intrinsic and extrinsic approaches for detecting genes in a bacterial genomeNucleic Acids Research, 1994