PROMALS: towards accurate multiple sequence alignments of distantly related proteins
Top Cited Papers
- 31 January 2007
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 23 (7) , 802-808
- https://doi.org/10.1093/bioinformatics/btm017
Abstract
Motivation: Accurate multiple sequence alignments are essential in protein structure modeling, functional prediction and efficient planning of experiments. Although the alignment problem has attracted considerable attention, preparation of high-quality alignments for distantly related sequences remains a difficult task. Results: We developed PROMALS, a multiple alignment method that shows promising results for protein homologs with sequence identity below 10%, aligning close to half of the amino acid residues correctly on average. This is about three times more accurate than traditional pairwise sequence alignment methods. PROMALS algorithm derives its strength from several sources: (i) sequence database searches to retrieve additional homologs; (ii) accurate secondary structure prediction; (iii) a hidden Markov model that uses a novel combined scoring of amino acids and secondary structures; (iv) probabilistic consistency-based scoring applied to progressive alignment of profiles. Compared to the best alignment methods that do not use secondary structure prediction and database searches (e.g. MUMMALS, ProbCons and MAFFT), PROMALS is up to 30% more accurate, with improvement being most prominent for highly divergent homologs. Compared to SPEM and HHalign, which also employ database searches and secondary structure prediction, PROMALS shows an accuracy improvement of several percent. Availability: The PROMALS web server is available at: http://prodata.swmed.edu/promals/ Contact: jpei@chop.swmed.edu Supplementary information: Supplementary data are available at Bioinformatics online.Keywords
This publication has 37 references indexed in Scilit:
- Multiple sequence alignmentCurrent Opinion in Structural Biology, 2006
- ProbCons: Probabilistic consistency-based multiple sequence alignmentGenome Research, 2005
- MUSCLE: multiple sequence alignment with high accuracy and high throughputNucleic Acids Research, 2004
- Detection of reliable and unexpected protein fold predictions using 3D-JuryNucleic Acids Research, 2003
- Dictionary of recurrent domains in protein structuresProteins-Structure Function and Bioinformatics, 1998
- Biological Sequence AnalysisPublished by Cambridge University Press (CUP) ,1998
- Touring protein fold space with Dali/FSSPNucleic Acids Research, 1998
- Profile hidden Markov models.Bioinformatics, 1998
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- Amino acid substitution matrices from protein blocks.Proceedings of the National Academy of Sciences, 1992