Comparative Genomics Search for Losses of Long-Established Genes on the Human Lineage

Abstract
Taking advantage of the complete genome sequences of several mammals, we developed a novel method to detect losses of well-established genes in the human genome through syntenic mapping of gene structures between the human, mouse, and dog genomes. Unlike most previous genomic methods for pseudogene identification, this analysis is able to differentiate losses of well-established genes from pseudogenes formed shortly after segmental duplication or generated via retrotransposition. Therefore, it enables us to find genes that were inactivated long after their birth, which were likely to have evolved nonredundant biological functions before being inactivated. The method was used to look for gene losses along the human lineage during the approximately 75 million years (My) since the common ancestor of primates and rodents (the euarchontoglire crown group). We identified 26 losses of well-established genes in the human genome that were all lost at least 50 My after their birth. Many of them were previously characterized pseudogenes in the human genome, such as GULO and UOX. Our methodology is highly effective at identifying losses of single-copy genes of ancient origin, allowing us to find a few well-known pseudogenes in the human genome missed by previous high-throughput genome-wide studies. In addition to confirming previously known gene losses, we identified 16 previously uncharacterized human pseudogenes that are definitive losses of long-established genes. Among them is ACYL3, an ancient enzyme present in archaea, bacteria, and eukaryotes, but lost approximately 6 to 8 Mya in the ancestor of humans and chimps. Although losses of well-established genes do not equate to adaptive gene losses, they are a useful proxy to use when searching for such genetic changes. This is especially true for adaptive losses that occurred more than 250,000 years ago, since any genetic evidence of the selective sweep indicative of such an event has been erased. One of the most important questions in biology is to identify the genetic changes underlying evolution, especially those along the lineage leading to the modern human. Although counterintuitive, losing a gene might actually bring a selective advantage to the organism. This type of gene loss is called adaptive gene loss. Although a few cases have been characterized in the literature, this is the first study to address adaptive gene losses on a scale of the whole human genome and a time period of up to 75 million years. The difficulty of identifying adaptive gene losses is in part the large number of pseudogenes in the human genome. To circumvent this problem, we used two methods to enrich the process for the adaptive candidates. The first is a novel approach for pseudogene detection that is highly sensitive in identifying single-copy pseudogenes that bear no apparent sequence homology to any functional human genes. Second, we used the length of time a gene is functional before loss as a proxy for biological importance, which allows us to differentiate losses of long-established genes from mere losses due to functional redundancy after gene duplication.