Selective amplification of protein-coding regions of large sets of genes using statistically designed primer sets

Abstract
We describe a novel approach to design a set of primers selective for large groups of genes. This method is based on the distribution frequency of all nucleotide combinations (octa- to decanucleotides), and the combined ability of primer pairs, based on these oligonucleotides, to detect genes. By analyzing 1000 human mRNAs, we found that a surprisingly small subset of octanucleotides is shared by a high proportion of human protein-coding region sense strands. By computer simulation of polymerase chain reactions, a set based on only 30 primers was able to detect approximately 75% of known (and presumably unknown) human protein-coding regions. To validate the method and provide experimental support for the feasibility of the more ambitious goal of targeting human protein-coding regions, we sought to apply the technique to a large protein family: G-protein coupled receptors (GPCRs). Our results indicate that there is sufficient low level homology among human coding regions to allow design of a limited set of primer pairs that can selectively target coding regions in general, as well as genomic subsets (e.g., GPCRs). The approach should be generally applicable to human coding regions, and thus provide an efficient method for analyzing much of the transcriptionally active human genome.