Finite State Automata from Regular Expression Trees

Open Access

1 July 1993

journal article
Published by Oxford University Press (OUP) in The Computer Journal

Vol. 36 (7) , 623-630
https://doi.org/10.1093/comjnl/36.7.623

Abstract

Thompson [13] introduced an innovative method for obtaining non-deterministic finite state automata (nfa) from regular expressions. His formulation of nfas makes use of [egr]-transitions (null symbol input) and requires in the worst case 2σ+2 OPS states, where σ is the number of occurrences of alphabet symbols and OPS is the number of operands in the original regular expression. We modify this algorithm to obtain a nfa M without ε-transitions that has in the worst case σ+1 states. Using multi-branch expression trees to store the regular expressions efficiently, the algorithm presented here is directly parallelizable. The algorithm necessitates that we maintain a finite state automata which has no ε-transitions and has a starting node of zero in-degree. The role of ε-transitions in finite state automata is examined and, based on the technique of bypassing [11], two alternative approaches are suggested. Three advantages manifest themselves during the improved construction: many fewer nodes are needed in the equivalent finite state automata; ε-transitions are not needed; the construction is highly parallelizable. With the advent of parallel processing, nfas can be simulated by multiple processors or in VLSI design [14]. The algorithm presented here results in the creation of a nfa which requires less hardware resources to simulate.

Keywords

This publication has 0 references indexed in Scilit: