Statistical methods for DNA sequence segmentation

Open Access

1 May 1998

journal article
Published by Institute of Mathematical Statistics in Statistical Science

Vol. 13 (2) , 142-162
https://doi.org/10.1214/ss/1028905933

Abstract

This article examines methods, issues and controversies that have arisen over the last decade in the effort to organize sequences of DNA base information into homogeneous segments. An array of different models and techniques have been considered and applied. We demonstrate that most approaches can be embedded into a suitable version of the multiple change-point problem, and we review the various methods in this light. We also propose and discuss a promising local segmentation method, namely, the application of split local polynomial fitting. The genome of bacteriophage $\lambda$ serves as an example sequence throughout the paper.

Keywords

This publication has 66 references indexed in Scilit:

Two-stage change-point estimators in smooth regression models
Statistics & Probability Letters, 1997
Reversible jump Markov chain Monte Carlo computation and Bayesian model determination
Biometrika, 1995
Bayesian Models for Multiple Local Sequence Alignment and Gibbs Sampling Strategies
Journal of the American Statistical Association, 1995
Hidden Markov Models in Computational Biology
Journal of Molecular Biology, 1994
Correlations in intronless DNA
Nature, 1992
Uncorrelated DNA walks
Nature, 1992
Partition models
Communications in Statistics - Theory and Methods, 1990
A tutorial on hidden Markov models and selected applications in speech recognition
Proceedings of the IEEE, 1989
Nonparametric statistical procedures for the changepoint problem
Journal of Statistical Planning and Inference, 1984
Theoretical models for heterogeneity of base composition in DNA
Journal of Theoretical Biology, 1974