Comparative genomics of the restriction-modification systems in Helicobacter pylori
- 13 February 2001
- journal article
- research article
- Published by Proceedings of the National Academy of Sciences in Proceedings of the National Academy of Sciences
- Vol. 98 (5) , 2740-2745
- https://doi.org/10.1073/pnas.051612298
Abstract
Graph-based representations are considered to be the future for reference genomes, as they allow integrated representation of the steadily increasing data on individual variation. Currently available tools allow denovo assembly of graph-based reference genomes, alignment of new read sets to the graph representation as well as certain analyses like variant calling and haplotyping. We here present a first method for calling ChIP-Seq peaks on read data aligned to a graph-based reference genome. The method is a graph generalization of the peak caller MACS2, and is implemented in an open source tool, Graph Peak Caller. By using the existing tool vg to build a pan-genome of Arabidopsis thaliana, we validate our approach by showing that Graph Peak Caller with a pan-genome reference graph can trace variants within peaks that are not part of the linear reference genome, and find peaks that in general are more motif-enriched than those found by MACS2. Author summary The expression of genes is a tightly regulated process. A key regulatory mechanism is the modulation of transcription by a class of proteins called transcription factors that bind to DNA in the spatial proximity of regulated genes. Determining the binding locations of transcription factors for specific cell types and settings is thus a key step in understanding the dynamics of normal cells as well as disease states. Binding sites for a given transcription factor are typically obtained through an experimental technique called CHiP-seq, in which DNA binding locations are obtained by sequencing DNA fragments attached to the transcription factor and aligning these sequences to a reference genome. A computational technique known as peak calling is then used to separate signal from noise and predict where the protein binds. Current peak callers are based on linear reference genomes that do not contain known genetic variants from the population. They thus potentially miss cases where proteins bind to such alternative genome sequences. Recently, a new type of reference genomes based on graph representations have become popular, as they are able to also incorporate alternative genome sequences. We here present Graph Peak Caller, the first peak caller that is able to exploit such graph representations for the detection of transcription factor binding locations. Using a graph-based reference genome for Arabidopsis thaliana, we show that our peak caller can lead to better detection of transcription factor binding locations as compared to a similar existing peak caller that uses a linear reference genome representation.Keywords
This publication has 34 references indexed in Scilit:
- Genetic organization and heterogeneity of the iceA locus of Helicobacter pyloriGene, 2000
- Analyzing the functional organization of a novel restriction modification system, the BcgI systemJournal of Molecular Biology, 1998
- Vibrio cholerae serotype 0139: Swapping genes for surface polysaccharide biosynthesisTrends in Microbiology, 1997
- DNA repeats identify novel virulence genes in Haemophilus influenzae.Proceedings of the National Academy of Sciences, 1996
- The Bacteria Behind UlcersScientific American, 1996
- Restriction-modification systems as genomic parasites in competition for specific sequences.Proceedings of the National Academy of Sciences, 1995
- Restriction enzymes in cells, not eppendorfsTrends in Microbiology, 1994
- Inactivation of Antibiotics and the Dissemination of Resistance GenesScience, 1994
- [24] Amino acid sequence arrangements of DNA-methyltransferasesPublished by Elsevier ,1992
- RESTRICTION AND MODIFICATION SYSTEMSAnnual Review of Genetics, 1991