Identification and interrogation of highly informative single nucleotide polymorphism sets defined by bacterial multilocus sequence typing databases

Abstract
A unified, bioinformatics-driven, single nucleotide polymorphism (SNP)-based approach to microbial genotyping has been developed. Multilocus sequence typing (MLST) databases consist of known variants of standardized housekeeping genes. Normally, seven fragments are defined; a sequence type (ST) consists of the variants of these fragments that are found in a particular isolate. A computer program that can identify highly informative sets of SNPs in entire MLST databases has been constructed. The SNPs either define a particular user-specified ST or provide a high value for Simpson's index of diversity (D), and may thus be generally applicable to that species. SNP sets that are diagnostic for Neisseria meningitidis ST-11 and ST-42, and high-D SNP sets for N. meningitidis and Staphylococcus aureus, were identified and real-time PCR methods to interrogate these SNPs were demonstrated. High-D SNP sets were also identified in other MLST databases. This widely applicable approach allows rapid genetic fingerprinting of infectious agents.