Sequence-Based Prediction of Type III Secreted Proteins
Open Access
- 24 April 2009
- journal article
- research article
- Published by Public Library of Science (PLoS) in PLoS Pathogens
- Vol. 5 (4) , e1000376
- https://doi.org/10.1371/journal.ppat.1000376
Abstract
The type III secretion system (TTSS) is a key mechanism for host cell interaction used by a variety of bacterial pathogens and symbionts of plants and animals including humans. The TTSS represents a molecular syringe with which the bacteria deliver effector proteins directly into the host cell cytosol. Despite the importance of the TTSS for bacterial pathogenesis, recognition and targeting of type III secreted proteins has up until now been poorly understood. Several hypotheses are discussed, including an mRNA-based signal, a chaperon-mediated process, or an N-terminal signal peptide. In this study, we systematically analyzed the amino acid composition and secondary structure of N-termini of 100 experimentally verified effector proteins. Based on this, we developed a machine-learning approach for the prediction of TTSS effector proteins, taking into account N-terminal sequence features such as frequencies of amino acids, short peptides, or residues with certain physico-chemical properties. The resulting computational model revealed a strong type III secretion signal in the N-terminus that can be used to detect effectors with sensitivity of ∼71% and selectivity of ∼85%. This signal seems to be taxonomically universal and conserved among animal pathogens and plant symbionts, since we could successfully detect effector proteins if the respective group was excluded from training. The application of our prediction approach to 739 complete bacterial and archaeal genome sequences resulted in the identification of between 0% and 12% putative TTSS effector proteins. Comparison of effector proteins with orthologs that are not secreted by the TTSS showed no clear pattern of signal acquisition by fusion, suggesting convergent evolutionary processes shaping the type III secretion signal. The newly developed program EffectiveT3 (http://www.chlamydiaedb.org) is the first universal in silico prediction program for the identification of novel TTSS effectors. Our findings will facilitate further studies on and improve our understanding of type III secretion and its role in pathogen–host interactions. Many Gram-negative bacteria live closely associated with humans, animals, or plants. The pathogenic or symbiotic interactions between bacteria and host are often mediated by the secretion of bacterial proteins into the host cells. The Type III secretion system (TTSS) is one of the best studied cellular machineries for this purpose and is able to specifically recognize and export effector proteins, which are injected into the eukaryotic cells through a needle-like structure. However, neither the mechanism of transport nor the recognition of proteins to be exported via the TTSS has so far been fully comprehended. In this study we have developed the first general computational model that is able to identify TTSS effector proteins based on the analysis of a short part of their amino acid sequences. The features of this signal sequence are universal among human and animal pathogens and plant symbionts. Based on our findings, we developed a computer program for the in silico prediction of TTSS effector candidates; for example, in new genomes. The TTSS and its effector proteins constitute a central virulence mechanism of several bacterial pathogens responsible for severe and widespread infectious diseases in humans and animals. Our findings will facilitate and improve further investigations of TTSS-mediated pathogenesis and its role in pathogen–host interactions.Keywords
This publication has 56 references indexed in Scilit:
- Control of gene expression by type III secretory activityCurrent Opinion in Microbiology, 2008
- Clustal W and Clustal X version 2.0Bioinformatics, 2007
- Protein secretion systems and adhesins: The molecular armory of Gram-negative pathogensInternational Journal of Medical Microbiology, 2007
- NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteinsNucleic Acids Research, 2007
- New developments in the InterPro databaseNucleic Acids Research, 2007
- STRING 7--recent developments in the integration and prediction of protein interactionsNucleic Acids Research, 2006
- An extensive repertoire of type III secretion effectors in Escherichia coli O157 and the role of lambdoid phages in their disseminationProceedings of the National Academy of Sciences, 2006
- NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteinsNucleic Acids Research, 2004
- MUSCLE: multiple sequence alignment with high accuracy and high throughputNucleic Acids Research, 2004
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997