CNAseg—a novel framework for identification of copy number changes in cancer from second-generation sequencing data
Open Access
- 21 October 2010
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 26 (24) , 3051-3058
- https://doi.org/10.1093/bioinformatics/btq587
Abstract
Motivation: Copy number abnormalities (CNAs) represent an important type of genetic mutation that can lead to abnormal cell growth and proliferation. New high-throughput sequencing technologies promise comprehensive characterization of CNAs. In contrast to microarrays, where probe design follows a carefully developed protocol, reads represent a random sample from a library and may be prone to representation biases due to GC content and other factors. The discrimination between true and false positive CNAs becomes an important issue. Results: We present a novel approach, called CNAseg, to identify CNAs from second-generation sequencing data. It uses depth of coverage to estimate copy number states and flowcell-to-flowcell variability in cancer and normal samples to control the false positive rate. We tested the method using the COLO-829 melanoma cell line sequenced to 40-fold coverage. An extensive simulation scheme was developed to recreate different scenarios of copy number changes and depth of coverage by altering a real dataset with spiked-in CNAs. Comparison to alternative approaches using both real and simulated datasets showed that CNAseg achieves superior precision and improved sensitivity estimates. Availability: The CNAseg package and test data are available at http://www.compbio.group.cam.ac.uk/software.html. Contact:Sergii.Ivakhno@cancer.org.uk Supplementary information: Supplementary data are available at Bioinformatics online.Keywords
This publication has 28 references indexed in Scilit:
- The landscape of somatic copy-number alteration across human cancersNature, 2010
- Personalized copy number and segmental duplication maps using next-generation sequencingNature Genetics, 2009
- BreakDancer: an algorithm for high-resolution mapping of genomic structural variationNature Methods, 2009
- A sequence-level map of chromosomal breakpoints in the MCF-7 breast cancer cell line yields insights into the evolution of a cancer genomeGenome Research, 2008
- High-resolution mapping of copy-number alterations with massively parallel sequencingNature Methods, 2008
- Accurate whole human genome sequencing using reversible terminator chemistryNature, 2008
- Substantial biases in ultra-short read data sets from high-throughput DNA sequencingNucleic Acids Research, 2008
- Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencingNature Genetics, 2008
- Whole-genome sequencing and variant discovery in C. elegansNature Methods, 2008
- Architectures of somatic genomic rearrangement in human cancer amplicons at sequence-level resolutionGenome Research, 2007