OrthoMCL: Identification of Ortholog Groups for Eukaryotic Genomes
Top Cited Papers
Open Access
- 2 September 2003
- journal article
- research article
- Published by Cold Spring Harbor Laboratory in Genome Research
- Vol. 13 (9) , 2178-2189
- https://doi.org/10.1101/gr.1224503
Abstract
The identification of orthologous groups is useful for genome annotation, studies on gene/protein evolution, comparative genomics, and the identification of taxonomically restricted sequences. Methods successfully exploited for prokaryotic genome analysis have proved difficult to apply to eukaryotes, however, as larger genomes may contain multiple paralogous genes, and sequence information is often incomplete. OrthoMCL provides a scalable method for constructing orthologous groups across multiple eukaryotic taxa, using a Markov Cluster algorithm to group (putative) orthologs and paralogs. This method performs similarly to the INPARANOID algorithm when applied to two genomes, but can be extended to cluster orthologs from multiple species. OrthoMCL clusters are coherent with groups identified by EGO, but improved recognition of “recent” paralogs permits overlapping EGO groups representing the same gene to be merged. Comparison with previously assigned EC annotations suggests a high degree of reliability, implying utility for automated eukaryotic genome annotation. OrthoMCL has been applied to the proteome data set from seven publicly available genomes (human, fly, worm, yeast, Arabidopsis, the malaria parasite Plasmodium falciparum, and Escherichia coli). A Web interface allows queries based on individual genes or user-defined phylogenetic patterns (http://www.cbil.upenn.edu/gene-family). Analysis of clusters incorporating P. falciparum genes identifies numerous enzymes that were incompletely annotated in first-pass annotation of the parasite genome.Keywords
This publication has 34 references indexed in Scilit:
- Genome sequence and comparative analysis of the model rodent malaria parasite Plasmodium yoelii yoeliiNature, 2002
- Genome sequence of the human malaria parasite Plasmodium falciparumNature, 2002
- The Plasmodium genome databaseNature, 2002
- An efficient algorithm for large-scale detection of protein familiesNucleic Acids Research, 2002
- Cross-Referencing Eukaryotic Genomes: TIGR Orthologous Gene Alignments (TOGA)Genome Research, 2002
- Automatic clustering of orthologs and in-paralogs from pairwise species comparisonsJournal of Molecular Biology, 2001
- HomologyTrends in Genetics, 2000
- A Genomic Perspective on Protein FamiliesScience, 1997
- A Plastid of Probable Green Algal Origin in Apicomplexan ParasitesScience, 1997
- CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choiceNucleic Acids Research, 1994