Modeling recurrent DNA copy number alterations in array CGH data
Open Access
- 1 July 2007
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 23 (13) , i450-i458
- https://doi.org/10.1093/bioinformatics/btm221
Abstract
Motivation: Recurrent DNA copy number alterations (CNA) measured with array comparative genomic hybridization (aCGH) reveal important molecular features of human genetics and disease. Studying aCGH profiles from a phenotypic group of individuals can determine important recurrent CNA patterns that suggest a strong correlation to the phenotype. Computational approaches to detecting recurrent CNAs from a set of aCGH experiments have typically relied on discretizing the noisy log ratios and subsequently inferring patterns. We demonstrate that this can have the effect of filtering out important signals present in the raw data. In this article we develop statistical models that jointly infer CNA patterns and the discrete labels by borrowing statistical strength across samples. Results: We propose extending single sample aCGH HMMs to the multiple sample case in order to infer shared CNAs. We model recurrent CNAs as a profile encoded by a master sequence of states that generates the samples. We show how to improve on two basic models by performing joint inference of the discrete labels and providing sparsity in the output. We demonstrate on synthetic ground truth data and real data from lung cancer cell lines how these two important features of our model improve results over baseline models. We include standard quantitative metrics and a qualitative assessment on which to base our conclusions. Availability:http://www.cs.ubc.ca/~sshah/acgh Contact:sshah@cs.ubc.caKeywords
This publication has 25 references indexed in Scilit:
- STAC: A method for testing the significance of DNA copy number aberrations across multiple array-CGH experimentsGenome Research, 2006
- Differential disruption of cell cycle pathways in small cell and non-small cell lung cancerBritish Journal of Cancer, 2006
- High resolution analysis of non-small cell lung cancer cell lines by whole genome tiling path array CGHInternational Journal of Cancer, 2005
- Multiple Microalterations Detected at High Frequency in Oral CancerCancer Research, 2005
- Hidden Markov models approach to the analysis of array CGH dataJournal of Multivariate Analysis, 2004
- Comprehensive whole genome array CGH profiling of mantle cell lymphoma model genomesHuman Molecular Genetics, 2004
- Breakpoint identification and smoothing of array comparative genomic hybridization dataBioinformatics, 2004
- A tiling resolution DNA microarray with complete coverage of the human genomeNature Genetics, 2004
- p73 transcriptional activity increases upon cooperation between its spliced formsOncogene, 2000
- Factorial Hidden Markov ModelsMachine Learning, 1997