THAP proteins target specific DNA sites through bipartite recognition of adjacent major and minor grooves

Abstract
THAP family proteins bind DNA and are involved in diverse DNA processes. The structure of the DNA-bound Drosophila P element transposase THAP domain now shows that the conserved β-sheet docks into the major groove, while a C-terminal basic loop binds the minor groove, a mode that is likely conserved and indicates the basis of bipartite sequence recognition. THAP-family C2CH zinc-coordinating DNA-binding proteins function in diverse eukaryotic cellular processes, such as transposition, transcriptional repression, stem-cell pluripotency, angiogenesis and neurological function. To determine the molecular basis for sequence-specific DNA recognition by THAP proteins, we solved the crystal structure of the Drosophila melanogaster P element transposase THAP domain (DmTHAP) in complex with a natural 10-base-pair site. In contrast to C2H2 zinc fingers, DmTHAP docks a conserved β-sheet into the major groove and a basic C-terminal loop into the adjacent minor groove. We confirmed specific protein-DNA interactions by mutagenesis and DNA-binding assays. Sequence analysis of natural and in vitro–selected binding sites suggests that several THAPs (DmTHAP and human THAP1 and THAP9) recognize a bipartite TXXGGGX(A/T) consensus motif; homology suggests THAP proteins bind DNA through a bipartite interaction. These findings reveal the conserved mechanisms by which THAP-family proteins engage specific chromosomal target elements.