More than 1,000 putative new human signalling proteins revealed by EST data mining
- 1 June 2000
- journal article
- research article
- Published by Springer Nature in Nature Genetics
- Vol. 25 (2) , 201-204
- https://doi.org/10.1038/76069
Abstract
Cloning procedures aided by homology searches of EST databases have accelerated the pace of discovery of new genes1, but EST database searching remains an involved and onerous task. More than 1.6 million human EST sequences have been deposited in public databases, making it difficult to identify ESTs that represent new genes. Compounding the problems of scale are difficulties in detection associated with a high sequencing error rate and low sequence similarity between distant homologues. We have developed a new method, coupling BLAST-based2 searches with a domain identification protocol3,4, that filters candidate homologues. Application of this method in a large-scale analysis of 100 signalling domain families has led to the identification of ESTs representing more than 1,000 novel human signalling genes. The 4,206 publicly available ESTs representing these genes are a valuable resource for rapid cloning of novel human signalling proteins. For example, we were able to identify ESTs of at least 106 new small GTPases, of which 6 are likely to belong to new subfamilies. In some cases, further analyses of genomic DNA led to the discovery of previously unidentified full-length protein sequences. This is exemplified by the in silico cloning (prediction of a gene product sequence using only genomic and EST sequence data) of a new type of GTPase with two catalytic domains.Keywords
This publication has 10 references indexed in Scilit:
- Individual variation in protein-coding sequences of human genomeAdvances in Protein Chemistry, 2000
- SMART: a web-based tool for the study of genetically mobile domainsNucleic Acids Research, 2000
- Nucleotide sequence databases: a gold mine for biologistsTrends in Biochemical Sciences, 1999
- The open reading frame YAL048c affects the secretion of proteinase A inS. cerevisiaeYeast, 1999
- Human Genome Project aims to finish ‘working draft’ next yearNature, 1999
- SMART, a simple modular architecture research tool: Identification of signaling domainsProceedings of the National Academy of Sciences, 1998
- Pieces of the puzzle: expressed sequence tags and the catalog of human genesJournal of Molecular Medicine, 1997
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- [11] Applying motif and profile searchesPublished by Elsevier ,1996