Shotgun sequencing of the human transcriptome with ORF expressed sequence tags
Open Access
- 28 March 2000
- journal article
- research article
- Published by Proceedings of the National Academy of Sciences in Proceedings of the National Academy of Sciences
- Vol. 97 (7) , 3491-3496
- https://doi.org/10.1073/pnas.97.7.3491
Abstract
Theoretical considerations predict that amplification of expressed gene transcripts by reverse transcription–PCR using arbitrarily chosen primers will result in the preferential amplification of the central portion of the transcript. Systematic, high-throughput sequencing of such products would result in an expressed sequence tag (EST) database consisting of central, generally coding regions of expressed genes. Such a database would add significant value to existing public EST databases, which consist mostly of sequences derived from the extremities of cDNAs, and facilitate the construction of contigs of transcript sequences. We tested our predictions, creating a database of 10,000 sequences from human breast tumors. The data confirmed the central distribution of the sequences, the significant normalization of the sequence population, the frequent extension of contigs composed of existing human ESTs, and the identification of a series of potentially important homologues of known genes. This approach should make a significant contribution to the early identification of important human genes, the deciphering of the draft human genome sequence currently being compiled, and the shotgun sequencing of the human transcriptome.Keywords
This publication has 25 references indexed in Scilit:
- Prostate Cancer Expression Profiling by cDNA Sequencing AnalysisGenomics, 1999
- An Expressed-Sequence-Tag Database of the Human Prostate: Sequence Analysis of 1168 cDNA ClonesGenomics, 1998
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- New opportunities for uncovering the molecular basis of cancerNature Genetics, 1997
- Minilibraries constructed from cDNA generated by arbitrarily primed RT-PCR: an alternative to normalized libraries for the generation of ESTs from nanogram quantities of mRNAGene, 1997
- Isolation and Regional Mapping of cDNAs Expressed during Early Human DevelopmentGenomics, 1997
- Toward the development of a gene index to the human genome: an assessment of the nature of high-throughput EST sequence data.Genome Research, 1996
- Generation and analysis of 280,000 human expressed sequence tags.Genome Research, 1996
- Prediction of the Coding Sequences of Unidentified Human Genes. I. The Coding Sequences of 40 New Genes (KIAA0001-KIAA0040) Deduced by Analysis of Randomly Sampled cDNA Clones from Human Immature Myeloid Cell Line KG-1DNA Research, 1994
- Sequence identification of 2,375 human brain genesNature, 1992