Abstract
The sequences of the human genome compiled in DNA databases are now about 10 megabase pairs (Mb), and thus the size of the sequences is several times the average size of chromosome bands at high resolution. By surveying this large quantity of data, it may be possible to clarify the global characteristics of the human genome, that is, correlation of gene sequence data (kb-level) to cytogenetic data (Mb-level). By extensively searching the GenBank database, we calculated codon usages in about 2000 human sequences. The highest G + C percentage at the third codon position was 97%, and that of about 250 sequences was 80% or more. The lowest G + C% was 27%, and that in about 150 sequences was 40% or less. A major portion of the GC-rich genes was found to be on special subsets of R-bands (T-bands and/or terminal R-bands). AT-rich genes, however, were mainly on G bands or non-T-type internal R-bands. Average G + C% at the third position for individual chromosomes differed among chromosomes, and were related to T-band density, quinacrine dullness, and mitotic chiasmata density in the respective chromosomes.