PCAP: A Whole-Genome Assembly Program
Open Access
- 2 September 2003
- journal article
- Published by Cold Spring Harbor Laboratory in Genome Research
- Vol. 13 (9) , 2164-2170
- https://doi.org/10.1101/gr.1390403
Abstract
We describe a whole-genome assembly program named PCAP for processing tens of millions of reads. The PCAP program has several features to address efficiency and accuracy issues in assembly. Multiple processors are used to perform most time-consuming computations in assembly. A more sensitive method is used to avoid missing overlaps caused by sequencing errors. Repetitive regions of reads are detected on the basis of many overlaps with other reads, instead of many shorter word matches with other reads. Contaminated end regions of reads are identified and removed. Generation of a consensus sequence for a contig is based on an alignment of reads in the contig, in which both base quality values and coverage information are used to determine every consensus base. The PCAP program was tested on a mouse whole-genome data set of 30 million reads and a human Chromosome 20 data set of 1.7 million reads. The program is freely available for academic use.Keywords
This publication has 22 references indexed in Scilit:
- The Phusion AssemblerGenome Research, 2002
- Initial sequencing and comparative analysis of the mouse genomeNature, 2002
- Whole-Genome Shotgun Assembly and Analysis of the Genome of Fugu rubripesScience, 2002
- RePS: A Sequence Assembler That Masks Exact Repeats Identified from the Shotgun DataGenome Research, 2002
- A Whole-Genome Assembly of DrosophilaScience, 2000
- AMASS: A Structured Pattern Matching Approach to Shotgun Sequence AssemblyJournal of Computational Biology, 1999
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- Combinatorial algorithms for DNA sequence assemblyAlgorithmica, 1995
- Aligning two sequences within a specified diagonal bandBioinformatics, 1992
- Programming pearlsCommunications of the ACM, 1986