SPEM: improving multiple sequence alignment with sequence profiles and predicted secondary structures
Open Access
- 14 July 2005
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 21 (18) , 3615-3621
- https://doi.org/10.1093/bioinformatics/bti582
Abstract
Motivation: Multiple sequence alignment is an essential part of bioinformatics tools for a genome-scale study of genes and their evolution relations. However, making an accurate alignment between remote homologs is challenging. Here, we develop a method, called SPEM, that aligns multiple sequences using pre-processed sequence profiles and predicted secondary structures for pairwise alignment, consistency-based scoring for refinement of the pairwise alignment and a progressive algorithm for final multiple alignment. Results: The alignment accuracy of SPEM is compared with those of established methods such as ClustalW, T-Coffee, MUSCLE, ProbCons and PRALINEPSI in easy (homologs) and hard (remote homologs) benchmarks. Results indicate that the average sum of pairwise alignment scores given by SPEM are 7–15% higher than those of the methods compared in aligning remote homologs (sequence identity 30%) is statistically indistinguishable from those of the state-of-the-art techniques such as ProbCons or MUSCLE 6.0. Availability: The SPEM server and its executables are available on http://theory.med.buffalo.edu Contact:yqzhou@buffalo.eduKeywords
This publication has 51 references indexed in Scilit:
- MAFFT version 5: improvement in accuracy of multiple sequence alignmentNucleic Acids Research, 2005
- 3DCoffee: Combining Protein Sequences and Structures within Multiple Sequence AlignmentsJournal of Molecular Biology, 2004
- Single‐body residue‐level knowledge‐based energy score combined with sequence‐profile and secondary structure information for fold recognitionProteins-Structure Function and Bioinformatics, 2004
- MUSCLE: multiple sequence alignment with high accuracy and high throughputNucleic Acids Research, 2004
- Within the twilight zone: a sensitive profile-profile comparison tool based on information theoryJournal of Molecular Biology, 2002
- Protein secondary structure prediction based on position-specific scoring matrices 1 1Edited by G. Von HeijneJournal of Molecular Biology, 1999
- A symmetric-iterated multiple alignment of protein sequencesJournal of Molecular Biology, 1998
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choiceNucleic Acids Research, 1994
- Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical featuresBiopolymers, 1983