Seeking an ancient enzyme in Methanococcus jannaschii using orf , a program based on predicted secondary structure comparisons
- 17 March 1998
- journal article
- research article
- Published by Proceedings of the National Academy of Sciences in Proceedings of the National Academy of Sciences
- Vol. 95 (6) , 2818-2823
- https://doi.org/10.1073/pnas.95.6.2818
Abstract
We have developed a simple procedure to identify protein homologs in genomic databases. The program, called orf, is based on comparisons of predicted secondary structure. Protein structure is far better conserved than amino acid sequence, and structure-based methods have been effective in exploiting this fact to find homologs, even among proteins with scant sequence identity. orf is a secondary structure-based method that operates solely on predictions from sequence and requires no experimentally determined information about the structure. The approach is illustrated by an example: Thymidylate synthase, a highly conserved enzyme essential to thymidine biosynthesis in both prokaryotes and eukaryotes, is thought to be used by Archaea, but a corresponding gene has yet to be identified. Here, a candidate thymidylate synthase is identified as a previously unassigned open reading frame from the genome of Methanococcus jannaschii, viz., MJ0757. Using primary structure information alone, the optimally aligned sequence identity between MJ0757 and Escherichia coli thymidylate synthase is 7%, well below the threshold of sensitivity for detection by sequence-based methods.Keywords
This publication has 51 references indexed in Scilit:
- SCOP: A structural classification of proteins database for the investigation of sequences and structuresPublished by Elsevier ,2006
- Protein fold recognition by prediction-based threadingJournal of Molecular Biology, 1997
- A 3D-1D substitution matrix for protein fold recognition that includes predicted secondary structure of the sequenceJournal of Molecular Biology, 1997
- Embedding strategies for effective use of information from multiple sequence alignmentsProtein Science, 1997
- CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choiceNucleic Acids Research, 1994
- Refined Structures of Substrate-bound and Phosphate-bound Thymidylate Synthase from Lactobacillus caseiJournal of Molecular Biology, 1993
- Prediction of Protein Secondary Structure at Better than 70% AccuracyJournal of Molecular Biology, 1993
- Assessment of protein models with three-dimensional profilesNature, 1992
- Inhibitory effects of long‐chain acyl coenzyme a analogues on rat liver acetyl coenzyme a carboxylaseFEBS Letters, 1979
- The protein data bank: A computer-based archival file for macromolecular structuresJournal of Molecular Biology, 1977