Finding flexible patterns in unaligned protein sequences
Open Access
- 1 August 1995
- journal article
- research article
- Published by Wiley in Protein Science
- Vol. 4 (8) , 1587-1595
- https://doi.org/10.1002/pro.5560040817
Abstract
We present a new method for the identification of conserved patterns in a set of unaligned related protein sequences. It is able to discover patterns of a quite general form, allowing for both ambiguous positions and for variable length wildcard regions. It allows the user to define a class of patterns (e.g., the degree of ambiguity allowed and the length and number of gaps), and the method is then guaranteed to find the conserved patterns in this class scoring highest according to a significance measure defined. Identified patterns may be refined using one of two new algorithms. We present a new (nonstatistical) significance measure for flexible patterns. The method is shown to recover known motifs for PROSITE families and is also applied to some recently described families from the literature.Keywords
This publication has 23 references indexed in Scilit:
- CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choiceNucleic Acids Research, 1994
- Detecting Patterns in Protein SequencesJournal of Molecular Biology, 1994
- Predicting protein function: a versatile tool for the Apple MacintoshBioinformatics, 1994
- Identification of sequence motifs from a set of porteins with related functionProtein Engineering, Design and Selection, 1994
- Detecting Subtle Sequence Signals: a Gibbs Sampling Strategy for Multiple AlignmentScience, 1993
- SRS—an indexing and retrieval tool for flat file data librariesBioinformatics, 1993
- Construction of a dictionary of sequence motifs that characterize groups of related proteinsProtein Engineering, Design and Selection, 1992
- SH3 — an abundant protein domain in search of a functionFEBS Letters, 1992
- A search for common patterns in many sequencesBioinformatics, 1992
- Identification of protein sequence homology by consensus template alignmentJournal of Molecular Biology, 1986