PlantTribes: a gene and gene family resource for comparative genomics in plants
Open Access
- 10 December 2007
- journal article
- research article
- Published by Oxford University Press (OUP) in Nucleic Acids Research
- Vol. 36 (Database) , D970-D976
- https://doi.org/10.1093/nar/gkm972
Abstract
The PlantTribes database ( http://fgp.huck.psu.edu/tribe.html ) is a plant gene family database based on the inferred proteomes of five sequenced plant species: Arabidopsis thaliana, Carica papaya, Medicago truncatula, Oryza sativa and Populus trichocarpa . We used the graph-based clustering algorithm MCL [Van Dongen ( Technical Report INS-R0010 2000) and Enright et al. ( Nucleic Acids Res . 2002; 30: 1575–1584)] to classify all of these species’ protein-coding genes into putative gene families, called tribes, using three clustering stringencies (low, medium and high). For all tribes, we have generated protein and DNA alignments and maximum-likelihood phylogenetic trees. A parallel database of microarray experimental results is linked to the genes, which lets researchers identify groups of related genes and their expression patterns. Unified nomenclatures were developed, and tribes can be related to traditional gene families and conserved domain identifiers. SuperTribes, constructed through a second iteration of MCL clustering, connect distant, but potentially related gene clusters. The global classification of nearly 200 000 plant proteins was used as a scaffold for sorting ∼4 million additional cDNA sequences from over 200 plant species. All data and analyses are accessible through a flexible interface allowing users to explore the classification, to place query sequences within the classification, and to download results for further study.Keywords
This publication has 30 references indexed in Scilit:
- The Plant Structure Ontology, a Unified Vocabulary of Anatomy and Morphology of a Flowering PlantPlant Physiology, 2006
- The TIGR Plant Transcript Assemblies databaseNucleic Acids Research, 2006
- Whole-Plant Growth Stage Ontology for Angiosperms and Its Application in Plant BiologyPlant Physiology, 2006
- Widespread genome duplications throughout the history of flowering plantsGenome Research, 2006
- Phylogeny and Domain Evolution in the APETALA2-like Gene FamilyMolecular Biology and Evolution, 2005
- MUSCLE: multiple sequence alignment with high accuracy and high throughputNucleic Acids Research, 2004
- NASCArrays: a repository for microarray data generated by NASC's transcriptomics serviceNucleic Acids Research, 2004
- An efficient algorithm for large-scale detection of protein familiesNucleic Acids Research, 2002
- Basic Local Alignment Search ToolJournal of Molecular Biology, 1990
- Basic local alignment search toolJournal of Molecular Biology, 1990