Evidence of Influence of Genomic DNA Sequence on Human X Chromosome Inactivation

Abstract
A significant number of human X-linked genes escape X chromosome inactivation and are thus expressed from both the active and inactive X chromosomes. The basis for escape from inactivation and the potential role of the X chromosome primary DNA sequence in determining a gene's X inactivation status is unclear. Using a combination of the X chromosome sequence and a comprehensive X inactivation profile of more than 600 genes, two independent yet complementary approaches were used to systematically investigate the relationship between X inactivation and DNA sequence features. First, statistical analyses revealed that a number of repeat features, including long interspersed nuclear element (LINE) and mammalian-wide interspersed repeat repetitive elements, are significantly enriched in regions surrounding transcription start sites of genes that are subject to inactivation, while Alu repetitive elements and short motifs containing ACG/CGT are significantly enriched in those that escape inactivation. Second, linear support vector machine classifiers constructed using primary DNA sequence features were used to correctly predict the X inactivation status for >80% of all X-linked genes. We further identified a small set of features that are important for accurate classification, among which LINE-1 and LINE-2 content show the greatest individual discriminatory power. Finally, as few as 12 features can be used for accurate support vector machine classification. Taken together, these results suggest that features of the underlying primary DNA sequence of the human X chromosome may influence the spreading and/or maintenance of X inactivation. Female mammals have two X chromosomes while males have one X and one Y chromosome. To equalize dosage of X chromosome genes in males and females, one X in female cells is inactivated, repressing the expression of most genes on the chromosome. Despite the chromosome-wide nature of X inactivation, at least 10%–15% of genes “escape” this inactivation in human females and are still expressed on the inactivated X. Whether a gene escapes or is subject to inactivation is thought to be determined epigenetically, and it is unknown to what extent, if at all, the underlying genomic DNA sequence of the chromosome plays a role. In this work, the authors show that the DNA sequence surrounding genes that escape inactivation is significantly different from the sequence surrounding genes that are subject to inactivation. In fact, a small number of DNA sequence features can be used to predict with high accuracy whether a gene will escape or be subject to this silencing. This establishes strong evidence that epigenetic regulation is, at least in part, dependent on genomic sequence and organization and provides a list of candidate sequence features whose role(s) in X inactivation can now be explored.