Complete sequencing and characterization of 21,243 full-length human cDNAs
Top Cited Papers
Open Access
- 21 December 2003
- journal article
- research article
- Published by Springer Nature in Nature Genetics
- Vol. 36 (1) , 40-45
- https://doi.org/10.1038/ng1285
Abstract
As a base for human transcriptome and functional genomics, we created the “full-length long Japan” (FLJ) collection of sequenced human cDNAs. We determined the entire sequence of 21,243 selected clones and found that 14,490 cDNAs (10,897 clusters) were unique to the FLJ collection. About half of them (5,416) seemed to be protein-coding. Of those, 1,999 clusters had not been predicted by computational methods. The distribution of GC content of nonpredicted cDNAs had a peak at ∼58% compared with a peak at ∼42%for predicted cDNAs. Thus, there seems to be a slight bias against GC-rich transcripts in current gene prediction procedures. The rest of the cDNAs unique to the FLJ collection (5,481) contained no obvious open reading frames (ORFs) and thus are candidate noncoding RNAs. About one-fourth of them (1,378) showed a clear pattern of splicing. The distribution of GC content of noncoding cDNAs was narrow and had a peak at ∼42%, relatively low compared with that of protein-coding cDNAs.Keywords
This publication has 30 references indexed in Scilit:
- Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAsNature, 2002
- Structural genomics: A pipeline for providing structures for the biologistProtein Science, 2002
- Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometryNature, 2002
- Functional organization of the yeast proteome by systematic analysis of protein complexesNature, 2002
- The DNA sequence and comparative analysis of human chromosome 20Nature, 2001
- Evaluation of Gene-Finding Programs on Mammalian SequencesGenome Research, 2001
- The Sequence of the Human GenomeScience, 2001
- Initial sequencing and analysis of the human genomeNature, 2001
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- Evaluation of Gene Structure Prediction ProgramsGenomics, 1996