Abstract
The revealing of the entire complement of protease and protease inhibitor sequences by the Human Genome Project will be of great importance to both academic and pharmaceutical research. Although the finishing phase is not yet complete, a selection of secondary annotation sources and comparisons with completed model organism genomes already allow useful estimates to be made. Conservative extrapolation suggests a total of approximately 1.8% for human proteases. This is close to the figures for yeast (1.7%) and worm (1.8%) but lower than the fly (3.4%) which has a large trypsin-like protease content. Using estimates for the human proteome of between 40,000 and 60,000 genes would extrapolate to 700-1,100 proteases, compared with approximately 360 currently represented as GenBank mRNAs. Preliminary comparisons between domain annotations for predicted human gene products and completed proteins suggest the genomic protease family and mechanistic class distributions will broadly reflect those in the current transcript data. The protease:inhibitor ratio at the mRNA level is currently approximately 9:1, but genome annotation data indicate that inhibitory domains are more widespread than this ratio would indicate.