Identifying protein-coding genes in genomic sequences
Open Access
- 30 January 2009
- journal article
- review article
- Published by Springer Nature in Genome Biology
- Vol. 10 (1) , 1-8
- https://doi.org/10.1186/gb-2009-10-1-201
Abstract
The vast majority of the biology of a newly sequenced genome is inferred from the set of encoded proteins. Predicting this set is therefore invariably the first step after the completion of the genome DNA sequence. Here we review the main computational pipelines used to generate the human reference protein-coding gene sets.Keywords
This publication has 56 references indexed in Scilit:
- Mapping and quantifying mammalian transcriptomes by RNA-SeqNature Methods, 2008
- Efficient targeted transcript discovery via array-based normalization of RACE librariesNature Methods, 2008
- Mapping and sequencing of structural variation from eight human genomesNature, 2008
- Highly Integrated Single-Base Resolution Maps of the Epigenome in ArabidopsisCell, 2008
- Mapping of Small RNAs in the Human ENCODE RegionsAmerican Journal of Human Genetics, 2008
- Distinguishing protein-coding and noncoding genes in the human genomeProceedings of the National Academy of Sciences, 2007
- Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot projectNature, 2007
- Gene identification signature (GIS) analysis for transcriptome characterization and genome annotationNature Methods, 2005
- Finishing the euchromatic sequence of the human genomeNature, 2004
- Initial sequencing and analysis of the human genomeNature, 2001