Assignment of Orthologous Genes via Genome Rearrangement
- 21 November 2005
- journal article
- research article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE/ACM Transactions on Computational Biology and Bioinformatics
- Vol. 2 (4) , 302-315
- https://doi.org/10.1109/tcbb.2005.48
Abstract
The assignment of orthologous genes between a pair of genomes is a fundamental and challenging problem in comparative genomics. Existing methods that assign orthologs based on the similarity between DNA or protein sequences may make erroneous assignments when sequence similarity does not clearly delineate the evolutionary relationship among genes of the same families. In this paper, we present a new approach to ortholog assignment that takes into account both sequence similarity and evolutionary events at a genome level, where orthologous genes are assumed to correspond to each other in the most parsimonious evolving scenario under genome rearrangement. First, the problem is formulated as that of computing the signed reversal distance with duplicates between the two genomes of interest. Then, the problem is decomposed into two new optimization problems, called minimum common partition and maximum cycle decomposition, for which efficient heuristic algorithms are given. Following this approach, we have implemented a high-throughput system for assigning orthologs on a genome scale, called SOAR, and tested it on both simulated data and real genome sequence data. Compared to a recent ortholog assignment method based entirely on homology search (called INPARANOID), SOAR shows a marginally better performance in terms of sensitivity on the real data set because it is able to identify several correct orthologous pairs that are missed by INPARANOID. The simulation results demonstrate that SOAR, in general, performs better than the iterated exemplar algorithm in terms of computing the reversal distance and assigning correct orthologs.Keywords
This publication has 21 references indexed in Scilit:
- Genome Rearrangements in Mammalian Evolution: Lessons From Human and Mouse GenomesGenome Research, 2002
- Reconstructing an ancestral genome using minimum segments duplications and reversalsJournal of Computer and System Sciences, 2002
- Guidelines for Human Gene NomenclatureGenomics, 2002
- Automatic clustering of orthologs and in-paralogs from pairwise species comparisonsJournal of Molecular Biology, 2001
- A Linear-Time Algorithm for Computing Inversion Distance between Signed Permutations with an Experimental StudyJournal of Computational Biology, 2001
- Sorting Strings by Reversals and by TranspositionsSIAM Journal on Discrete Mathematics, 2001
- Transforming cabbage into turnipJournal of the ACM, 1999
- A Genomic Perspective on Protein FamiliesScience, 1997
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- Linear-space algorithms that build local alignments from fragmentsAlgorithmica, 1995