Classification of Transmembrane Protein Families in the Caenorhabditis elegans Genome and Identification of Human Orthologs
Open Access
- 1 November 2000
- journal article
- research article
- Published by Cold Spring Harbor Laboratory in Genome Research
- Vol. 10 (11) , 1679-1689
- https://doi.org/10.1101/gr.gr-1491r
Abstract
The complete genome sequence of the nematode Caenorhabditis elegans provides an excellent basis for studying the distribution and evolution of protein families in higher eukaryotes. Three fundamental questions are as follows: How many paralog clusters exist in one species, how many of these are shared with other species, and how many proteins can be assigned a functional counterpart in other species? We have addressed these questions in a detailed study of predicted membrane proteins in C. elegans and their mammalian homologs. All worm proteins predicted to contain at least two transmembrane segments were clustered on the basis of sequence similarity. This resulted in 189 groups with two or more sequences, containing, in total, 2647 worm proteins. Hidden Markov models (HMMs) were created for each family, and were used to retrieve mammalian homologs from the SWISSPROT, TREMBL, and VTS databases. About one-half of these clusters had mammalian homologs. Putative worm-mammalian orthologs were extracted by use of nine different phylogenetic methods and BLAST. Eight clusters initially thought to be worm-specific were assigned mammalian homologs after searching EST and genomic sequences. A compilation of 174 orthology assignments made with high confidence is presented. [Tables describing transmembrane protein families and orthology assignments are available from ftp.cgr.ki.se/pub/data/worm.]Keywords
This publication has 34 references indexed in Scilit:
- Immunoglobulin superfamily proteins in Caenorhabditis elegans 1 1Edited by G. von HeijneJournal of Molecular Biology, 2000
- SMART: a web-based tool for the study of genetically mobile domainsNucleic Acids Research, 2000
- Chemosensory signaling in C. elegansBioEssays, 1999
- A Novel Family of Divergent Seven-Transmembrane ProteinsNeuron, 1999
- Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methodsJournal of Molecular Biology, 1998
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- A dot-matrix program with dynamic threshold control suited for genomic DNA and protein sequence analysisGene, 1995
- Divergent seven transmembrane receptors are candidate chemosensory receptors in C. elegansCell, 1995
- CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choiceNucleic Acids Research, 1994
- NoticesCladistics, 1989