Mendel-GFDb and Mendel-ESTS: databases of plant gene families and ESTs annotated with gene family numbers and gene family names

1 January 2001

journal article
research article
Published by Oxford University Press (OUP) in Nucleic Acids Research

Vol. 29 (1) , 120-122
https://doi.org/10.1093/nar/29.1.120

Abstract

There is no control over the information provided with sequences when they are deposited in the sequence databases. Consequently mistakes can seed the incorrect annotation of other sequences. Grouping genes into families and applying controlled annotation overcomes the problems of incorrect annotation associated with individual sequences. Two databases (http://www.mendel.ac.uk) were created to apply controtled annotation to plant genes and plant ESTs: Mendel-GFDb is a database of plant protein (gene) families based on gapped-BLAST analysis of all sequences in the SWISS-PROT family of databases. Sequences are aligned (ClustalW) and identical and similar residues shaded. The families are visually curated to ensure that one or more criteria, for example overall relatedness andlor domain similarity relate ail sequences within a family. Sequence families are assigned a 'Gene Family Number' and a unified description is developed which best describes the family and its members. If authority exists the gene family is assigned a 'Gene Family Name'. This information is placed in MendelGFDb, Mendel-ESTS is primarily a database of plant ESTs, which have been compared to Mendel-GFDb, completely sequenced genomes and domain databases. This approach associated ESTs with individual sequences and the controlled annotation of gene families and protein domains; the information being placed in Mendel-ESTS, The controlled annotation applied to genes and ESTs provides a basis from which a plant transcription database can be developed.

Keywords

This publication has 14 references indexed in Scilit:

Comparative analysis of the Arabidopsis and rice expressed sequence tag (EST) sets.
2001
The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000
Nucleic Acids Research, 2000
The PROSITE database, its status in 1999
Nucleic Acids Research, 1999
Automated protein sequence database classification. I. Integration of compositional similarity search, local similarity search, and multiple sequence alignment.
Bioinformatics, 1998
Sources of systematic error in functional annotation of genomes: domain rearrangement, non-orthologous gene displacement and operon disruption.
1998
The Construction of Arabidopsis Expressed Sequence Tag Assemblies (A New Resource to Facilitate Gene Identification)
Plant Physiology, 1996
A guide to naming sequenced plant genes
Plant Molecular Biology, 1996
The SWISS-PROT protein sequence data bank and its new supplement TREMBL
Nucleic Acids Research, 1996
CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice
Nucleic Acids Research, 1994
Basic local alignment search tool
Journal of Molecular Biology, 1990