Embedding strategies for effective use of information from multiple sequence alignments
- 1 March 1997
- journal article
- research article
- Published by Wiley in Protein Science
- Vol. 6 (3) , 698-705
- https://doi.org/10.1002/pro.5560060319
Abstract
We describe a new strategy for utilizing multiple sequence alignment information to detect distant relationships in searches of sequence databases. A single sequence representing a protein family is enriched by replacing conserved regions with position-specific scoring matrices (PSSMs) or consensus residues derived from multiple alignments of family members. In comprehensive tests of these and other family representations, PSSM-embedded queries produced the best results overall when used with a special version of the Smith-Waterman searching algorithm. Moreover, embedding consensus residues instead of PSSMs improved performance with readily available single sequence query searching programs, such as BLAST and FASTA. Embedding PSSMs or consensus residues into a representative sequence improves searching performance by extracting multiple alignment information from motif regions while retaining single sequence information where alignment is uncertain.Keywords
Funding Information
- NIH (GM29009)
This publication has 40 references indexed in Scilit:
- Identification of common molecular subsequencesPublished by Elsevier ,2004
- Position-based sequence weightsPublished by Elsevier ,2004
- Automated construction and graphical presentation of protein blocks from unaligned sequencesGene, 1995
- Bacterial Genome Sequence BaggedScience, 1995
- CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choiceNucleic Acids Research, 1994
- Detecting Patterns in Protein SequencesJournal of Molecular Biology, 1994
- Hidden Markov Models in Computational BiologyJournal of Molecular Biology, 1994
- dbEST — database for “expressed sequence tags”Nature Genetics, 1993
- Basic local alignment search toolJournal of Molecular Biology, 1990
- Detecting homology of distantly related proteins with consensus sequencesJournal of Molecular Biology, 1987