A Guild of 45 CRISPR-Associated (Cas) Protein Families and Multiple CRISPR/Cas Subtypes Exist in Prokaryotic Genomes
Top Cited Papers
Open Access
- 11 November 2005
- journal article
- research article
- Published by Public Library of Science (PLoS) in PLoS Computational Biology
- Vol. 1 (6) , e60-483
- https://doi.org/10.1371/journal.pcbi.0010060
Abstract
Clustered regularly interspaced short palindromic repeats (CRISPRs) are a family of DNA direct repeats found in many prokaryotic genomes. Repeats of 21–37 bp typically show weak dyad symmetry and are separated by regularly sized, nonrepetitive spacer sequences. Four CRISPR-associated (Cas) protein families, designated Cas1 to Cas4, are strictly associated with CRISPR elements and always occur near a repeat cluster. Some spacers originate from mobile genetic elements and are thought to confer “immunity” against the elements that harbor these sequences. In the present study, we have systematically investigated uncharacterized proteins encoded in the vicinity of these CRISPRs and found many additional protein families that are strictly associated with CRISPR loci across multiple prokaryotic species. Multiple sequence alignments and hidden Markov models have been built for 45 Cas protein families. These models identify family members with high sensitivity and selectivity and classify key regulators of development, DevR and DevS, in Myxococcus xanthus as Cas proteins. These identifications show that CRISPR/cas gene regions can be quite large, with up to 20 different, tandem-arranged cas genes next to a repeat cluster or filling the region between two repeat clusters. Distinctive subsets of the collection of Cas proteins recur in phylogenetically distant species and correlate with characteristic repeat periodicity. The analyses presented here support initial proposals of mobility of these units, along with the likelihood that loci of different subtypes interact with one another as well as with host cell defensive, replicative, and regulatory systems. It is evident from this analysis that CRISPR/cas loci are larger, more complex, and more heterogeneous than previously appreciated. The family of clustered regularly interspaced short palindromic repeats (CRISPRs) describes a class of DNA repeats found in nearly half of all bacterial and archaeal genomes. These DNA repeat regions have a remarkably regular structure: unique sequences of constant size, called spacers, sit between each pair of repeats. The DNA repeats do not encode proteins, but appear to be transcribed and processed into small RNAs that may have any number of functions, including resistance to any phage (i.e., virus of bacteria) whose sequence matches a spacer; spacers change rapidly as microbial strains evolve. This work describes 41 new CRISPR-associated (cas) gene families, which are always found near these repeats, in addition to the four previously known. It shows that CRISPR systems belong to different classes, with different repeat patterns, sets of genes, and species ranges. Most of these seem to come and go rather rapidly from their host genomes. These possibly beneficial mobile genetic elements may play an important role in driving prokaryotic evolution.Keywords
This publication has 34 references indexed in Scilit:
- Gene Transfer and Genome Plasticity inThermotoga maritima, a Model Hyperthermophilic SpeciesJournal of Bacteriology, 2005
- DNA Microarray Analysis of Nitrogen Fixation and Fe(III) Reduction in Geobacter sulfurreducensApplied and Environmental Microbiology, 2005
- Genome Properties: a system for the investigation of prokaryotic genetic content for microbiology, genome annotation and comparative genomicsBioinformatics, 2004
- Genus-Specific Protein Binding to the Large Clusters of DNA Repeats (Short Regularly Spaced Repeats) Present in Sulfolobus GenomesJournal of Bacteriology, 2003
- The DevT Protein Stimulates Synthesis of FruA, a Signal Transduction Protein Required for Fruiting Body Morphogenesis in Myxococcus xanthusJournal of Bacteriology, 2002
- Identification of a Novel Family of Sequence Repeats among ProkaryotesOMICS: A Journal of Integrative Biology, 2002
- The FruA signal transduction protein provides a checkpoint for the temporal co‐ordination of inter‐ cellular signals in Myxococcus xanthus developmentMolecular Microbiology, 1998
- Complete Genome Sequence of the Methanogenic Archaeon, Methanococcus jannaschii Science, 1996
- A dot-matrix program with dynamic threshold control suited for genomic DNA and protein sequence analysisGene, 1995
- Long stretches of short tandem repeats are present in the largest replicons of the Archaea Haloferax mediterranei and Haloferax volcanii and could be involved in replicon partitioningMolecular Microbiology, 1995