Comparing Vertebrate Whole-Genome Shotgun Reads to the Human Genome
- 1 November 2001
- journal article
- research article
- Published by Cold Spring Harbor Laboratory in Genome Research
- Vol. 11 (11) , 1807-1816
- https://doi.org/10.1101/gr.203601
Abstract
Multi-species sequence comparisons are a very efficient way to reveal conserved genes. Because sequence finishing is expensive and time consuming, many genome sequences are likely to stay incomplete. A challenge is to use these fragmented data for understanding the human genome. Methods for using cross-species whole-genome shotgun sequence (WGS) for genome annotation are described in this paper. About one-half million high-quality rat WGS reads (covering 7.5% of the rat genome) generated at the Baylor College of Medicine Human Genome Sequencing Center were compared with the human genome. Using computer-generated random reads as a negative control, a set of parameters was determined for reliable interpretation of BLAST search results. About 10% of the rat reads contain regions that are conserved in the human genomic sequence and about one-third of these include known gene-coding regions. Mapping the conserved regions to human chromosomes showed a 23-fold enrichment for coding regions compared with noncoding regions. This approach can also be applied to other mammalian genomes for gene finding. These data predicted ∼42,500 genes in the human, slightly more than reported previously.Keywords
This publication has 15 references indexed in Scilit:
- The Sequence of the Human GenomeScience, 2001
- Initial sequencing and analysis of the human genomeNature, 2001
- Human-mouse genome comparisons to locate regulatory sitesNature Genetics, 2000
- The Human Transcript Database: a catalogue of full length cDNA insertsBioinformatics, 2000
- The Promise of Comparative Genomics in MammalsScience, 1999
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- Large-scale sequencing in human chromosome 12p13: experimental and computational gene structure determination.Genome Research, 1997
- A “Double Adaptor” Method for Improved Shotgun Library ConstructionAnalytical Biochemistry, 1996
- Electrophoretically Uniform Fluorescent Dyes for Automated DNA SequencingScience, 1996
- An Evolutionary Trace Method Defines Binding Surfaces Common to Protein FamiliesJournal of Molecular Biology, 1996