HTSeq—a Python framework to work with high-throughput sequencing data
Top Cited Papers
Open Access
- 25 September 2014
- journal article
- conference paper
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 31 (2) , 166-169
- https://doi.org/10.1093/bioinformatics/btu638
Abstract
Motivation: A large choice of tools exists for many standard tasks in the analysis of high-throughput sequencing (HTS) data. However, once a project deviates from standard workflows, custom scripts are needed. Results: We present HTSeq, a Python library to facilitate the rapid development of such scripts. HTSeq offers parsers for many common data formats in HTS projects, as well as classes to represent data, such as genomic coordinates, sequences, sequencing reads, alignments, gene model information and variant calls, and provides data structures that allow for querying via genomic coordinates. We also present htseq-count, a tool developed with HTSeq that preprocesses RNA-Seq data for differential expression analysis by counting the overlap of reads with genes. Availability and implementation: HTSeq is released as an open-source software under the GNU General Public Licence and available from http://www-huber.embl.de/HTSeq or from the Python Package Index at https://pypi.python.org/pypi/HTSeq . Contact:sanders@fs.tum.deKeywords
All Related Versions
This publication has 12 references indexed in Scilit:
- Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2Genome Biology, 2014
- RNA-Seq Gene Profiling - A Systematic Empirical ComparisonPLOS ONE, 2014
- Trimmomatic: a flexible trimmer for Illumina sequence dataBioinformatics, 2014
- featureCounts: an efficient general purpose program for assigning sequence reads to genomic featuresBioinformatics, 2013
- Software for Computing and Annotating Genomic RangesPLoS Computational Biology, 2013
- Pybedtools: a flexible Python library for manipulating genomic datasets and annotationsBioinformatics, 2011
- BEDTools: a flexible suite of utilities for comparing genomic featuresBioinformatics, 2010
- edgeR: a Bioconductor package for differential expression analysis of digital gene expression dataBioinformatics, 2009
- Biopython: freely available Python tools for computational molecular biology and bioinformaticsBioinformatics, 2009
- Bioconductor: open software development for computational biology and bioinformaticsGenome Biology, 2004