Mixture Models as a Method to Find Present and Divergent Genes in Comparative Genomic Hybridization Studies on Bacteria

Abstract
Comparative genomic hybridization (CGH) using microarrays is performed on bacteria in order to test for genomic diversity within various bacterial species. The microarrays used for CGH are based on the genome of a fully sequenced bacterium strain, denoted reference strain. Labelled DNA fragments from a sample strain of interest and from the reference strain are hybridized to the array. Based on the obtained ratio intensities and the total intensities of the signals, each gene is classified as either present (one copy or multiple copies) or divergent (zero copies).In this paper mixture models with different number of components are tted on different combinations of variables and compared with each other. The study shows that mixture models fitted on both the ratio intensities and the total intensities including the replicates for each gene improve, compared to previously published methods, the results for several of the data sets tested. Some summaries of the data sets are proposed as a guide for the choice of model and the choice of number of components.The models are applied on data from CGH experiments with the bacteriaStaphylococcus aureusandStreptococcus pneumoniae.

This publication has 28 references indexed in Scilit: