A framework for collaborative analysis of ENCODE data: Making large-scale analyses biologist-friendly
- 13 June 2007
- journal article
- research article
- Published by Cold Spring Harbor Laboratory in Genome Research
- Vol. 17 (6) , 960-964
- https://doi.org/10.1101/gr.5578007
Abstract
The standardization and sharing of data and tools are the biggest challenges of large collaborative projects such as the Encyclopedia of DNA Elements (ENCODE). Here we describe a compact Web application, Galaxy2ENCODE, that effectively addresses these issues. It provides an intuitive interface for the deposition and access of data, and features a vast number of analysis tools including operations on genomic intervals, utilities for manipulation of multiple sequence alignments, and molecular evolution algorithms. By providing a direct link between data and analysis tools, Galaxy2ENCODE allows addressing biological questions that are beyond the reach of existing software. We use Galaxy2ENCODE to show that the ENCODE regions contain >2000 unannotated transcripts under strong purifying selection that are likely functional. We also show that the ENCODE regions are representative of the entire genome by estimating the rate of nucleotide substitution and comparing it to published data. Although each of these analyses is complex, none takes more than 15 min from beginning to end. Finally, we demonstrate how new tools can be added to Galaxy2ENCODE with almost no effort. Every section of the manuscript is supplemented with QuickTime screencasts. Galaxy2ENCODE and the screencasts can be accessed at http://g2.bx.psu.edu.Keywords
This publication has 16 references indexed in Scilit:
- Conservation and functional significance of gene topology in the genome of Caenorhabditis elegansGenome Research, 2006
- Transcriptional Maps of 10 Human Chromosomes at 5-Nucleotide ResolutionScience, 2005
- HyPhy: hypothesis testing using phylogeniesBioinformatics, 2004
- The ENCODE (ENCyclopedia Of DNA Elements) ProjectScience, 2004
- Male-Biased Mutation Rate and Divergence in Autosomal, Z-Linked and W-Linked Introns of Chicken and TurkeyMolecular Biology and Evolution, 2004
- Genome sequence of the Brown Norway rat yields insights into mammalian evolutionNature, 2004
- Combining Phylogenetic and Hidden Markov Models in Biosequence AnalysisJournal of Computational Biology, 2004
- Covariation in Frequencies of Substitution, Deletion, Transposition, and Recombination During Eutherian EvolutionGenome Research, 2003
- Comparison of models for nucleotide substitution used in maximum-likelihood phylogenetic estimation.Molecular Biology and Evolution, 1994
- The general stochastic model of nucleotide substitutionJournal of Theoretical Biology, 1990