Abstract
Protein topology can be described at different levels. At the most fundamental level, it is a sequence of secondary structure elements (a `primary topology string'). Searching predicted primary topology strings against a library of strings from known protein structures is the basis of some protein fold recognition methods. Here a method known as TOPSCAN is presented for rapid comparison of protein structures. Rather than a simple two-letter alphabet (encoding strand and helix), more complex alphabets are used encoding direction, proximity, accessibility and length of secondary elements and loops in addition to secondary structure. Comparisons are made between the structural information content of primary topology strings and encodings which contain additional information (`secondary topology strings'). The algorithm is extremely fast, with a scan of a large domain against a library of more than 2000 secondary structure strings completing in ∼30 s. Analysis of protein fold similarity using TOPSCAN at primary and secondary topology levels is presented.