Conserved Clusters of Functionally Related Genes in Two Bacterial Genomes

Abstract
An approach for genome comparison, combining function classification of gene products and sequence comparison, is presented. The genomes of Haemophilus influenzae and Escherichia coli are analyzed, and all genes are classified into nine major functional classes, corresponding to important cellular processes. To study gene order relationships and genome organization in the two bacteria, we performed statistics on neighboring pairs of genes. To estimate the significance of the observations, a statistical model based on binomial distributions has been developed. Significant patterns of gene order are observed within, as well as between, the two bacterial genomes: Functionally related genes tend to be neighbors more often than do unrelated genes. Some of these groups represent well-known operons, but additional gene clusters are identified. These clusters correspond to genomic elements that have been conserved during bacterial evolution. In addition to nearest-neighbor relationships, the method is also useful to study the relative direction of transcription in genomes, which is also highly conserved between homologous gene pairs. This new approach combines the high-level description of molecular function with pair statistics that express genome organization. It is expected to complement traditional methods of sequence analysis in the study of genomic structure, function, and evolution.