An algorithm for finding the largest approximately common substructures of two trees

1 August 1998

journal article
Published by Institute of Electrical and Electronics Engineers (IEEE)

Vol. 20 (8) , 889-895
https://doi.org/10.1109/34.709622

Abstract

Ordered, labeled trees are trees in which each node has a label and the left-to-right order of its children (if it has any) is fixed. Such trees have many applications in vision, pattern recognition, molecular biology and natural language processing. We consider a substructure of an ordered labeled tree T to be a connected subgraph of T. Given two ordered labeled trees T/sub 1/ and T/sub 2/ and an integer d, the largest approximately common substructure problem is to find a substructure U/sub 1/ of T/sub 1/ and a substructure U/sub 2/ of T/sub 2/ such that U/sub 1/ is within edit distance d of U/sub 2/ and where there does not exist any other substructure V/sub 1/ of T/sub 1/ and V/sub 2/ of T/sub 2/ such that V/sub 1/ and V/sub 2/ satisfy the distance constraint and the sum of the sizes of V/sub 1/ and V/sub 2/ is greater than the sum of the sizes of U/sub 1/ and U/sub 2/. We present a dynamic programming algorithm to solve this problem, which runs as fast as the fastest known algorithm for computing the edit distance of two trees when the distance allowed in the common substructures is a constant independent of the input trees. To demonstrate the utility of our algorithm, we discuss its application to discovering motifs in multiple RNA secondary structures (which are ordered labeled trees).

Keywords

This publication has 19 references indexed in Scilit:

Secondary structure computer prediction of the poliovirus 5' non-coding region is improved by a genetic algorithm
Bioinformatics, 1997
A metric between unrooted and unordered trees and its bottom-up computing method
Published by Institute of Electrical and Electronics Engineers (IEEE) ,1994
Exact and approximate algorithms for unordered tree matching
IEEE Transactions on Systems, Man, and Cybernetics, 1994
Approximate Tree Matching in the Presence of Variable Length Don′t Cares
Journal of Algorithms, 1994
An algorithm for graph optimal monomorphism
IEEE Transactions on Systems, Man, and Cybernetics, 1990
Comparing multiple RNA secondary structures using tree comparisons
Bioinformatics, 1990
Simple Fast Algorithms for the Editing Distance between Trees and Related Problems
SIAM Journal on Computing, 1989
RNA secondary structures: comparison and determination of frequently recurring substructures by consensus
Bioinformatics, 1989
An algorithm for comparing multiple RNA secondary structures
Bioinformatics, 1988
The Tree-to-Tree Correction Problem
Journal of the ACM, 1979