Complete reannotation of the Arabidopsis genome: methods, tools, protocols and the final release
Open Access
- 22 March 2005
- journal article
- research article
- Published by Springer Nature in BMC Biology
- Vol. 3 (1) , 7
- https://doi.org/10.1186/1741-7007-3-7
Abstract
Background: Since the initial publication of its complete genome sequence, Arabidopsis thaliana has become more important than ever as a model for plant research. However, the initial genome annotation was submitted by multiple centers using inconsistent methods, making the data difficult to use for many applications. Results: Over the course of three years, TIGR has completed its effort to standardize the structural and functional annotation of the Arabidopsis genome. Using both manual and automated methods, Arabidopsis gene structures were refined and gene products were renamed and assigned to Gene Ontology categories. We present an overview of the methods employed, tools developed, and protocols followed, summarizing the contents of each data release with special emphasis on our final annotation release (version 5). Conclusion: Over the entire period, several thousand new genes and pseudogenes were added to the annotation. Approximately one third of the originally annotated gene models were significantly refined yielding improved gene structure annotations, and every protein-coding gene was manually inspected and classified using Gene Ontology terms.Keywords
This publication has 89 references indexed in Scilit:
- BrassinosteroidsThe Arabidopsis Book, 2011
- The Pfam protein families databaseNucleic Acids Research, 2004
- Improving the Arabidopsis genome annotation using maximal transcript alignment assembliesNucleic Acids Research, 2003
- Apparent homology of expressed genes from wood-forming tissues of loblolly pine ( Pinus taeda L.) with Arabidopsis thalianaProceedings of the National Academy of Sciences, 2003
- BLAT—The BLAST-Like Alignment ToolGenome Research, 2002
- Predicting transmembrane protein topology with a hidden markov model: application to complete genomes11Edited by F. CohenJournal of Molecular Biology, 2001
- Predicting Subcellular Localization of Proteins Based on their N-terminal Amino Acid SequenceJournal of Molecular Biology, 2000
- Do natural antisense transcripts make sense in eukaryotes?Gene, 1998
- Prediction of complete gene structures in human genomic DNAJournal of Molecular Biology, 1997
- CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choiceNucleic Acids Research, 1994