Design Parameters to Control Synthetic Gene Expression in Escherichia coli
Open Access
- 14 September 2009
- journal article
- research article
- Published by Public Library of Science (PLoS) in PLOS ONE
- Vol. 4 (9) , e7002
- https://doi.org/10.1371/journal.pone.0007002
Abstract
Production of proteins as therapeutic agents, research reagents and molecular tools frequently depends on expression in heterologous hosts. Synthetic genes are increasingly used for protein production because sequence information is easier to obtain than the corresponding physical DNA. Protein-coding sequences are commonly re-designed to enhance expression, but there are no experimentally supported design principles. To identify sequence features that affect protein expression we synthesized and expressed in E. coli two sets of 40 genes encoding two commercially valuable proteins, a DNA polymerase and a single chain antibody. Genes differing only in synonymous codon usage expressed protein at levels ranging from undetectable to 30% of cellular protein. Using partial least squares regression we tested the correlation of protein production levels with parameters that have been reported to affect expression. We found that the amount of protein produced in E. coli was strongly dependent on the codons used to encode a subset of amino acids. Favorable codons were predominantly those read by tRNAs that are most highly charged during amino acid starvation, not codons that are most abundant in highly expressed E. coli proteins. Finally we confirmed the validity of our models by designing, synthesizing and testing new genes using codon biases predicted to perform well. The systematic analysis of gene design parameters shown in this study has allowed us to identify codon usage within a gene as a critical determinant of achievable protein expression levels in E. coli. We propose a biochemical basis for this, as well as design algorithms to ensure high protein production from synthetic genes. Replication of this methodology should allow similar design algorithms to be empirically derived for any expression system.Keywords
This publication has 46 references indexed in Scilit:
- Coding-Sequence Determinants of Gene Expression in Escherichia coliScience, 2009
- You're one in a googol: optimizing genes for protein expressionJournal of The Royal Society Interface, 2009
- Engineering proteinase K using machine learning and synthetic genesBMC Biotechnology, 2007
- Optimal encoding rules for synthetic genes: the need for a community effortMolecular Systems Biology, 2007
- SGDB: a database of synthetic genes re-designed for optimizing protein over-expressionNucleic Acids Research, 2006
- Over Expression of a tRNALeu Isoacceptor Changes Charging Pattern of Leucine tRNAs and Reveals New Codon ReadingJournal of Molecular Biology, 2005
- Anatomy of Escherichia coli ribosome binding sites 1 1Edited by D. DraperJournal of Molecular Biology, 2001
- Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structureJournal of Molecular Biology, 1999
- Co-variation of tRNA Abundance and Codon Usage inEscherichia coliat Different Growth RatesJournal of Molecular Biology, 1996
- Parameters affecting transcription termination by Escherichia coli RNA: II. Construction and analysis of hybrid terminatorsJournal of Molecular Biology, 1992