Prominent use of distal 5′ transcription start sites and discovery of a large number of additional exons in ENCODE regions
Open Access
- 13 June 2007
- journal article
- Published by Cold Spring Harbor Laboratory in Genome Research
- Vol. 17 (6) , 746-759
- https://doi.org/10.1101/gr.5660607
Abstract
The synaptic cell adhesion molecules encoded by the protocadherin gene cluster are hypothesized to provide a molecular code involved in the generation of synaptic complexity in the developing brain. Variation in copy number and sequence content of protocadherin cluster genes among vertebrate species could reflect adaptive differences in protocadherin function. We have completed an analysis of zebrafish protocadherin cluster genes. Zebrafish have two unlinked protocadherin clusters, DrPcdh1 and DrPcdh2. Like mammalian protocadherin clusters, DrPcdh1 has both α and γ variable and constant region exons. A consensus protocadherin promoter motif sequence identified in mammals is also conserved in zebrafish. Few orthologous relationships, however, are apparent between zebrafish and mammalian protocadherin proteins. Here we show that protocadherin cluster genes in human, mouse, rat, and zebrafish are subject to striking gene conversion events. These events are restricted to regions of the coding sequence, particularly the coding sequences of ectodomain 6 and the cytoplasmic domain. Diversity among paralogs is restricted to particular ectodomains that are excluded from conversion events. Conversion events are also strongly correlated with an increase in third-position GC content. We propose that the combination of lineage-specific duplication, restricted gene conversion, and adaptive variation in diversified ectodomains drives vertebrate protocadherin cluster evolution.Keywords
This publication has 39 references indexed in Scilit:
- Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot projectNature, 2007
- Structured RNAs in the ENCODE selected regions of the human genomeGenome Research, 2007
- Pseudogenes in the ENCODE regions: Consensus annotation, analysis of transcription, and evolutionGenome Research, 2007
- Biological function of unannotated transcription during the early development of Drosophila melanogasterNature Genetics, 2006
- Quantitative microarray profiling provides evidence against widespread coupling of alternative splicing with nonsense-mediated mRNA decay to control gene expressionGenes & Development, 2006
- Diversification of transcriptional modulation: Large-scale identification and characterization of putative alternative promoters of human genesGenome Research, 2005
- Gene identification signature (GIS) analysis for transcriptome characterization and genome annotationNature Methods, 2005
- Finishing the euchromatic sequence of the human genomeNature, 2004
- C. elegans ORFeome version 1.1: experimental verification of the genome annotation and resource for proteome-scale protein expressionNature Genetics, 2003
- Initial sequencing and analysis of the human genomeNature, 2001