The Core Protein of Alphaviruses

Abstract
The primary structure of the core protein of Semliki Forest virus has been established by protein chemical characterization of 102 peptides, generated by digestion with trypsin, pepsin, thermolysin, and by partial acid cleavage of the protein. Besides a difference in one position, the sequence as established by these experiment is in agreement with the sequence predicted from the nucleotide sequence of the mRNA [Garoff et al. (1980) Proc. Natl Acad. Sci. USA, 77, 6376–6380]. The core protein has a blocked N terminus, consists of 267 amino acid residues, and has the following amino acid composition: Asp12, Asn9, Thr16, Ser10, Glu11, Gln15, Pro23, Gly20, Ala23, Val19, Met8, Ile11, Leu9, Tyr7, Phe6, His7, Lys37, Arg15, Trp5, Cys4, and an Mr of 29919. It contains 22.1% basic amino acids, mainly lysines, compared with a total of 8.6% acidic residues. The resulting surplus of positive charge is located in the N‐terminal half of the protein (predominantly arginines at positions 12–21 and lysines at positions 66–114). Other amino acids are also unevenly distributed; proline and glutamine are accumulated in the N‐terminal half of the sequence whereas histidine, glycine and the acidic residues are mainly present in the C‐terminal part. This distribution suggests that the virus core protein consists of two or more structural domains.