Genome Cluster Database. A Sequence Family Analysis Platform for Arabidopsis and Rice
Open Access
- 1 May 2005
- journal article
- research article
- Published by Oxford University Press (OUP) in Plant Physiology
- Vol. 138 (1) , 47-54
- https://doi.org/10.1104/pp.104.059048
Abstract
The genome-wide protein sequences from Arabidopsis (Arabidopsis thaliana) and rice (Oryza sativa) spp. japonica were clustered into families using sequence similarity and domain-based clustering. The two fundamentally different methods resulted in separate cluster sets with complementary properties to compensate the limitations for accurate family analysis. Functional names for the identified families were assigned with an efficient computational approach that uses the description of the most common molecular function gene ontology node within each cluster. Subsequently, multiple alignments and phylogenetic trees were calculated for the assembled families. All clustering results and their underlying sequences were organized in the Web-accessible Genome Cluster Database (http://bioinfo.ucr.edu/projects/GCD) with rich interactive and user-friendly sequence family mining tools to facilitate the analysis of any given family of interest for the plant science community. An automated clustering pipeline ensures current information for future updates in the annotations of the two genomes and clustering improvements. The analysis allowed the first systematic identification of family and singlet proteins present in both organisms as well as those restricted to one of them. In addition, the established Web resources for mining these data provide a road map for future studies of the composition and structure of protein families between the two species.Keywords
This publication has 45 references indexed in Scilit:
- The Cell Wall Navigator Database. A Systems-Based Approach to Organism-Unrestricted Mining of Protein Families Involved in Cell Wall MetabolismPlant Physiology, 2004
- The Pfam protein families databaseNucleic Acids Research, 2004
- Systematic Trans-Genomic Comparison of Protein Kinases between Arabidopsis and Saccharomyces cerevisiaePlant Physiology, 2003
- The Arabidopsis Information Resource (TAIR): a model organism database providing a centralized, curated gateway to Arabidopsis biology, research materials and communityNucleic Acids Research, 2003
- The structure of the protein universe and genome evolutionNature, 2002
- An efficient algorithm for large-scale detection of protein familiesNucleic Acids Research, 2002
- Statistics of local complexity in amino acid sequences and sequence databasesPublished by Elsevier ,2001
- Characterization and Expression of Four Proline-Rich Cell Wall Protein Genes in Arabidopsis Encoding Two Distinct Subsets of Multiple Domain ProteinsPlant Physiology, 1999
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- Basic local alignment search toolJournal of Molecular Biology, 1990