HTSeq – A Python framework to work with high-throughput sequencing data
Top Cited Papers
Preprint
- 20 February 2014
- preprint
- Published by Cold Spring Harbor Laboratory in bioRxiv
- p. 002824
- https://doi.org/10.1101/002824
Abstract
Motivation: A large choice of tools exists for many standard tasks in the analysis of high-throughput sequencing (HTS) data. However, once a project deviates from standard work flows, custom scripts are needed.Results: We present HTSeq, a Python library to facilitate the rapid development of such scripts. HTSeq offers parsers for many common data formats in HTS projects, as well as classes to represent data such as genomic coordinates, sequences, sequencing reads, alignments, gene model information, variant calls, and provides data structures that allow for querying via genomic coordinates. We also present htseq-count, a tool developed with HTSeq that preprocesses RNA-Seq data for differential expression analysis by counting the overlap of reads with genes.Availability: HTSeq is released as open-source software under the GNU General Public Licence and available fromhttp://www-huber.embl.de/HTSeqor from the Python Package Indexhttps://pypi.python.org/pypi/HTSeq.Contact: sanders@fs.tum.deKeywords
All Related Versions
- Published version: Bioinformatics, 31 (2), 166.
This publication has 12 references indexed in Scilit:
- RNA-seq gene profiling - a systematic empirical comparisonbioRxiv, 2014
- Trimmomatic: a flexible trimmer for Illumina sequence dataBioinformatics, 2014
- Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2bioRxiv, 2014
- featureCounts: an efficient general purpose program for assigning sequence reads to genomic featuresBioinformatics, 2013
- Pybedtools: a flexible Python library for manipulating genomic datasets and annotationsBioinformatics, 2011
- The NumPy Array: A Structure for Efficient Numerical ComputationComputing in Science & Engineering, 2011
- Cython: The Best of Both WorldsComputing in Science & Engineering, 2010
- BEDTools: a flexible suite of utilities for comparing genomic featuresBioinformatics, 2010
- edgeR: a Bioconductor package for differential expression analysis of digital gene expression dataBioinformatics, 2009
- Biopython: freely available Python tools for computational molecular biology and bioinformaticsBioinformatics, 2009