Abstract
Three pro-.alpha.1 collagen complementary DNA (cDNA) clones, pCg1, pCg26, and pCg54, and 2 pro-.alpha.2 collagen cDNA clones, pCg13 and pCg45, were subjected to extensive DNA sequence determination. The combined sequences specified the amino acid sequences for chicken pro-.alpha.1 and pro-.alpha.2 type I collagens starting at residue 814 in the collagen triple-helical region and continuing to the procollagen C-terminal as determined by the first in-phase termination codon. Thus, the sequences of 272 pro-.alpha.1 C-terminal, 260 pro-.alpha.2 C-terminal, 201 pro-.alpha.1 helical and 201 pro-.alpha.2 helical amino acids were established. In addition, the sequences of several hundred nucleotides corresponding to noncoding regions of both procollagen mRNA were determined. In total, 1589 pro-.alpha.1 base pairs and 1691 pro-.alpha.2 base pairs were sequenced, corresponding to .apprx. 1/3 of the total length of each mRNA. Both procollagen mRNA sequences have a high G + C content. The pro-.alpha.1 mRNA is 75% G + C in the helical coding region sequenced and 61% G + C in the C-terminal coding region while the pro-.alpha.2 mRNA is 60% and 48% G + C, respectively, in these regions. The dinucleotide sequence pCG occurs at a higher frequency in both sequences than is normally found in vertebrate DNA and is .apprx. 5 times more frequent in the pro-.alpha.1 sequence than in the pro-.alpha.2 sequence. Nucleotide homology in the helical coding regions is very limited given that these sequences code for the repeating Gly-X-Y tripeptide in a region where X and Y residues are 50% conserved. these differences are clearly reflected in the preferred codon usages of the 2 mRNA.