Abstract
SUMMARY: The principal route to understanding the biological significance of the genome sequence comes from discovery and characterization of that portion of the genome that is transcribed into RNA products. We now know that this`transcriptome' is unexpectedly complex and its precise definition in any one species requires multiple technical approaches and an ability to work on a very large scale. A key step is the development of technologies able to capture snapshots of the complexity of the various kinds of RNA generated by the genome. As the human, mouse and other model genome sequencing projects approach completion, considerable effort has been focused on identifying and annotating the protein-coding genes as the principal output of the genome. In pursuing this aim, several key technologies have been developed to generate large numbers and highly diverse sets of full-length cDNAs and their variants. However, the search has identified another hidden transcriptional universe comprising a wide variety of non-protein coding RNA transcripts. Despite initial scepticism, various experiments and complementary technologies have demonstrated that these RNAs are dynamically transcribed and a subset of them can act as sense–antisense RNAs, which influence the transcriptional output of the genome. Recent experimental evidence suggests that the list of non-protein coding RNAs is still largely incomplete and that transcription is substantially more complex even than currently thought.