Annotating non-coding regions of the genome
Top Cited Papers
- 13 July 2010
- journal article
- research article
- Published by Springer Nature in Nature Reviews Genetics
- Vol. 11 (8) , 559-571
- https://doi.org/10.1038/nrg2814
Abstract
Most of the human genome consists of DNA that does not code for proteins. Annotating functional regions in the non-coding genome involves two complementary analysis techniques: comparative analysis, which involves examining DNA sequences, and functional analysis, which involves examining the output of functional genomics experiments. With the exponential increase in DNA sequence data, it is now possible to compare sequences within a single human haplotype, between cell types in a single person, across the human population and between species. Integrating the analysis across all these scales is useful. There are two main methods of sequence comparison: scanning for regions of high sequence similarity above some operational threshold, and building statistical models of sequence families. Model-based sequence analysis can incorporate more biological knowledge than sequence similarity scans and provide more refined results. The output of most high-throughput functional genomics experiments can be treated as a continuous signal mapped onto the genome and analysed with a standardized signal processing approach. Signal processing involves smoothing the raw signal, then thresholding and segmenting the signal into discrete annotated blocks. Integration of multiple types of signals generates a progression of more and more complex annotations; these smaller annotations are clustered into groups and then into functional networks that begin to represent the state of biological knowledge about the genome. A chronic problem with annotation based on functional genomics data is the lack of sufficient validation by more low-throughput methods. Techniques such as paired-end sequencing and chromosome conformation capture (and its descendants) enable annotation of connectivity between elements and necessitate a move beyond the one-dimensional signal approach to annotation.Keywords
This publication has 122 references indexed in Scilit:
- An oestrogen-receptor-α-bound human chromatin interactomeNature, 2009
- Genomic views of distant-acting enhancersNature, 2009
- Copy number variants (CNVs) in primate species using array-based comparative genomic hybridizationMethods, 2009
- Unlocking the secrets of the genomeNature, 2009
- Comprehensive resequence analysis of a 136 kb region of human chromosome 8q24 associated with prostate and colon cancersHuman Genetics, 2008
- High-Resolution Profiling of Histone Methylations in the Human GenomePublished by Elsevier ,2007
- Human cell type diversity, evolution, development, and classification with special reference to cells derived from the neural crestBiological Reviews, 2006
- A Bivalent Chromatin Structure Marks Key Developmental Genes in Embryonic Stem CellsCell, 2006
- Initial sequencing and analysis of the human genomeNature, 2001
- Genomic binding sites of the yeast cell-cycle transcription factors SBF and MBFNature, 2001