An algorithm for comparing RNA secondary structures and searching for similar substructures

Abstract
To access the functional informations carried by RNA molecules at the level of their secondary structure interactions, we propose a comparison method based on a tree edit algorithm which takes into account the tree structure of RNA foldings. Any secondary structure is translated into a tree involving all its elementary substructures; then a shorter condensed tree is built in which any unbranched helix interspersed with bulges and interior loops is taken as a single node. This method includes several parameters: a comparison matrix between structural units, gap penalties, and the scoring between nodes of the condensed trees. Their effects have been analysed using as a model a rapidly divergent domain of the large ribosomal RNA, for which structural lunation during evolution is well known. This method allows one to recognize precisely, in large target molecules, definite substructures that present with the query molecules only a limited set of closely related secondary structure features; it is still efficient if intervening features, which can correspond to insertion/deletion of entire stem regions, separate such structural elements. When coupled with a hierarchical clustering algorithm, this method is suitable for classifying RNA molecules according to their secondary structure homologies.

This publication has 0 references indexed in Scilit: