Comparing Genomes with Duplications: A Computational Complexity Point of View

12 November 2007

journal article
Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE/ACM Transactions on Computational Biology and Bioinformatics

Vol. 4 (4) , 523-534
https://doi.org/10.1109/tcbb.2007.1069

Abstract

In this paper, we are interested in the computational complexity of computing (dis)similarity measures between two genomes when they contain duplicated genes or genomic markers, a problem that happens frequently when comparing whole nuclear genomes. Recently, several methods [1], [2] have been proposed that are based on two steps to compute a given (dis)similarity measure M between two genomes G₁ and G₂: First, one establishes a one-to-one correspondence between the genes of G₂; and the genes of G₂; second, once this correspondence is established, it explicitly defines a permutation and it is then possible to quantify their similarity using classical measures defined for permutations like the number of breakpoints. Hence, these methods rely on two elements: a way to establish a one-to-one correspondence between genes of a pair of genomes and a (dis)similarity measure for permutations. The problem is then, given a (dis)similarity measure for permutations, compute a correspondence that defines an optimal permutation for this measure. We are interested here in two models to compute a one-to-one correspondence: the exemplar model, where all but one copy is deleted in both genomes for each gene family, and the matching model, which computes a maximal correspondence for each gene family. We show that, for these two models and for three (dis)similarity measures on permutations, namely, the number of common intervals, the maximum adjacency disruption (MAD) number, and the summed adjacency disruption (SAD) number, the problem of computing an optimal correspondence is NP-complete and even APX-hard for the MAD number and the SAD number.

Keywords

This publication has 18 references indexed in Scilit:

Assignment of Orthologous Genes via Genome Rearrangement
IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2005
Genome Rearrangement Distances and Gene Order Phylogeny in γ-Proteobacteria
Molecular Biology and Evolution, 2005
Divide-and-conquer approach for the exemplar breakpoint distance
Bioinformatics, 2005
Comparative architectures of mammalian and chicken genomes reveal highly variable rates of genomic rearrangements across different lineages
Genome Research, 2004
Genomic Distances under Deletions and Insertions
Theoretical Computer Science, 2004
SHOT: a web server for the construction of genome phylogenies
Trends in Genetics, 2002
Gene and genome duplication
Current Opinion in Genetics & Development, 2001
The Complexity of Calculating Exemplar Distances
Published by Springer Nature ,2000
A threshold of ln n for approximating set cover
Journal of the ACM, 1998
Optimization, approximation, and complexity classes
Journal of Computer and System Sciences, 1991