A comparative study of duplications in bacteria and eukaryotes: the importance of telomeres

Abstract
The genomes of three bacteria (Haemophilus influenzae, Mycoplasma genitalium, and Escherichia coli) and two eukaryotes (Saccharomyces cerevisiae and Caenorhabditis elegans) were compared. The distribution of their putative open reading frames (ORFs) was studied, and several conclusions were drawn: (1) All of these genomes, even the smallest, exhibit a significant proportion (7%-30%) of duplicated ORFs. This proportion is a function of genome size and appears unrelated to the bacteria/eukaryote division. (2) Some of these ORFs constitute families of up 20 or more members. (3) The levels of sequence similarity within these families are highly variable and their distribution is different among bacteria and eukaryotes. (4) In yeast, there are topological relationships between members of the same family. The paired ORFs are frequently in the same orientation with regard to their respective telomeres and located at comparable distances from them.