fjoin: Simple and Efficient Computation of Feature Overlaps
- 1 October 2006
- journal article
- research article
- Published by Mary Ann Liebert Inc in Journal of Computational Biology
- Vol. 13 (8) , 1457-1464
- https://doi.org/10.1089/cmb.2006.13.1457
Abstract
Sets of biological features with genome coordinates (e.g., genes and promoters) are a particularly common form of data in bioinformatics today. Accordingly, an increasingly important processing step involves comparing coordinates from large sets of features to find overlapping feature pairs. This paper presents fjoin, an efficient, robust, and simple algorithm for finding these pairs, and a downloadable implementation. For typical bioinformatics feature sets, fjoin requires O(n log(n)) time (O(n) if the inputs are sorted) and uses O(1) space. The reference implementation is a stand-alone Python program; it implements the basic algorithm and a number of useful extensions, which are also discussed in this paper.Keywords
This publication has 5 references indexed in Scilit:
- LOVELACE RESPIRATORY RESEARCH INSTITUTE ANNUAL RESPIRATORY SYMPOSIUM: MECHANISMS OF RESPIRATORY DISEASES IN THE ELDERLYExperimental Lung Research, 2005
- The Mouse Genome Database (MGD): from genes to mice--a community resource for mouse biologyNucleic Acids Research, 2004
- The mouse Gene Expression Database (GXD): updates and enhancementsNucleic Acids Research, 2004
- Initial sequencing and comparative analysis of the mouse genomeNature, 2002
- A Computer Program for Aligning a cDNA Sequence with a Genomic DNA SequenceGenome Research, 1998