Structural assignments to the Mycoplasma genitalium proteins show extensive gene duplications and domain rearrangements
- 8 December 1998
- journal article
- Published by Proceedings of the National Academy of Sciences in Proceedings of the National Academy of Sciences
- Vol. 95 (25) , 14658-14663
- https://doi.org/10.1073/pnas.95.25.14658
Abstract
The parasitic bacterium Mycoplasma genitalium has a small, reduced genome with close to a basic set of genes. As a first step toward determining the families of protein domains that form the products of these genes, we have used the multiple sequence programs PSI-BLAST and GEANFAMMER to match the sequences of the 467 gene products of M. genitalium to the sequences of the domains that form proteins of known structure [Protein Data Bank (PDB) sequences]. PDB sequences (274) match all of 106 M. genitalium sequences and some parts of another 85; thus, 41% of its total sequences are matched in all or part. The evolutionary relationships of the PDB domains that match M. genitalium are described in the structural classification of proteins (SCOP) database. Using this information, we show that the domains in the matched M. genitalium sequences come from 114 superfamilies and that 58% of them have arisen by gene duplication. This level of duplication is more than twice that found by using pairwise sequence comparisons. The PDB domain matches also describe the domain structure of the matched sequences: just over a quarter contain one domain and the rest have combinations of two or more domains.Keywords
This publication has 32 references indexed in Scilit:
- SCOP: A structural classification of proteins database for the investigation of sequences and structuresPublished by Elsevier ,2006
- Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methodsJournal of Molecular Biology, 1998
- Homology-based fold predictions for Mycoplasma genitalium proteinsJournal of Molecular Biology, 1998
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- CATH – a hierarchic classification of protein domain structuresPublished by Elsevier ,1997
- Protein evolution viewed through Escherichia coli Protein sequences: Introducing the notion of a structural segment of homology, the moduleJournal of Molecular Biology, 1997
- Hidden Markov Models in Computational BiologyJournal of Molecular Biology, 1994
- One thousand families for the molecular biologistNature, 1992
- The protein data bank: A computer-based archival file for macromolecular structuresJournal of Molecular Biology, 1977
- Chemical and biological evolution of a nucleotide-binding proteinNature, 1974