Abstract
The advent of the human and model organism genome project has provided an increasingly complete list of genes that code for the building blocks of life on Earth. Deciphering the functions of all these genes has proven to be no easy task. The availability of mountains of transcriptional profiling data from modern large-scale gene-expression technologies such as serial analysis of gene expression (SAGE) (1), oligonucleotide arrays (2), and cDNA microarrays (3) represents a tremendous windfall for computational biologists who have largely migrated from many different fields. One article appearing in this issue of PNAS (4) introduces a novel computational approach, shortest path (SP) analysis, to assign gene functions in a transitive fashion along a correlation linkage path terminated by two known genes belonging to the same functional category. A major goal of microarray data analyses is to identify genes that interact with each other where not every player has a similar transcriptional profile. Currently the most popular way to identify interesting genes and their functions is to perform cluster analysis on the relative expression pattern changes (Fig. 1A) in typical microarray experiments that survey a range of conditions (reviewed in ref. 5). The fundamental premise of the clustering approach is that genes having similar expression profile across a set of conditions (cellular process, responses, phenotypes, etc.) may share similar functions (6). Obviously the word “function” is too general to be precise and quantitative and too broad to be specific and meaningful. Genes, the products of which may have same function (say, phosphorylating other proteins), do not necessarily share similar transcriptional pattern. Conversely, genes having different functions can have a similar expression profile simply by chance or stochastic fluctuations. Although many potential caveats exist, large numbers of functionally related genes do show very similar expression patterns under a relevant set …