PET-Tool: a software suite for comprehensive processing and managing of Paired-End diTag (PET) sequence data
Open Access
- 25 August 2006
- journal article
- software
- Published by Springer Nature in BMC Bioinformatics
- Vol. 7 (1) , 390
- https://doi.org/10.1186/1471-2105-7-390
Abstract
Background We recently developed the Paired End diTag (PET) strategy for efficient characterization of mammalian transcriptomes and genomes. The paired end nature of short PET sequences derived from long DNA fragments raised a new set of bioinformatics challenges, including how to extract PETs from raw sequence reads, and correctly yet efficiently map PETs to reference genome sequences. To accommodate and streamline data analysis of the large volume PET sequences generated from each PET experiment, an automated PET data process pipeline is desirable. Results We designed an integrated computation program package, PET-Tool, to automatically process PET sequences and map them to the genome sequences. The Tool was implemented as a web-based application composed of four modules: the Extractor module for PET extraction; the Examiner module for analytic evaluation of PET sequence quality; the Mapper module for locating PET sequences in the genome sequences; and the ProjectManager module for data organization. The performance of PET-Tool was evaluated through the analyses of 2.7 million PET sequences. It was demonstrated that PET-Tool is accurate and efficient in extracting PET sequences and removing artifacts from large volume dataset. Using optimized mapping criteria, over 70% of quality PET sequences were mapped specifically to the genome sequences. With a 2.4 GHz LINUX machine, it takes approximately six hours to process one million PETs from extraction to mapping. Conclusion The speed, accuracy, and comprehensiveness have proved that PET-Tool is an important and useful component in PET experiments, and can be extended to accommodate other related analyses of paired-end sequences. The Tool also provides user-friendly functions for data quality check and system for multi-layer data management.Keywords
This publication has 16 references indexed in Scilit:
- The Oct4 and Nanog transcription network regulates pluripotency in mouse embryonic stem cellsNature Genetics, 2006
- A Global Map of p53 Transcription-Factor Binding Sites in the Human GenomeCell, 2006
- The Transcriptional Landscape of the Mammalian GenomeScience, 2005
- Gene identification signature (GIS) analysis for transcriptome characterization and genome annotationNature Methods, 2005
- 5′-end SAGE for the analysis of transcriptional start sitesNature Biotechnology, 2004
- 5′ Long serial analysis of gene expression (LongSAGE) and 3′ LongSAGE for transcriptome characterization and genome annotationProceedings of the National Academy of Sciences, 2004
- Cap analysis gene expression for high-throughput analysis of transcriptional starting point and identification of promoter usageProceedings of the National Academy of Sciences, 2003
- Digital karyotypingProceedings of the National Academy of Sciences, 2002
- Using the transcriptome to annotate the genomeNature Biotechnology, 2002
- Serial Analysis of Gene ExpressionScience, 1995