Homology-extended sequence alignment
Open Access
- 18 February 2005
- journal article
- research article
- Published by Oxford University Press (OUP) in Nucleic Acids Research
- Vol. 33 (3) , 816-824
- https://doi.org/10.1093/nar/gki233
Abstract
We present a profile–profile multiple alignment strategy that uses database searching to collect homologues for each sequence in a given set, in order to enrich their available evolutionary information for the alignment. For each of the alignment sequences, the putative homologous sequences that score above a pre-defined threshold are incorporated into a position-specific pre-alignment profile. The enriched position-specific profile is used for standard progressive alignment, thereby more accurately describing the characteristic features of the given sequence set. We show that owing to the incorporation of the pre-alignment information into a standard progressive multiple alignment routine, the alignment quality between distant sequences increases significantly and outperforms state-of-the-art methods, such as T-COFFEE and MUSCLE. We also show that although entirely sequence-based, our novel strategy is better at aligning distant sequences when compared with a recent contact-based alignment method. Therefore, our pre-alignment profile strategy should be advantageous for applications that rely on high alignment accuracy such as local structure prediction, comparative modelling and threading.Keywords
This publication has 48 references indexed in Scilit:
- MUSCLE: multiple sequence alignment with high accuracy and high throughputNucleic Acids Research, 2004
- A comparison of scoring functions for protein sequence profile alignmentBioinformatics, 2004
- Within the twilight zone: a sensitive profile-profile comparison tool based on information theoryJournal of Molecular Biology, 2002
- T-coffee: a novel method for fast and accurate multiple sequence alignment 1 1Edited by J. ThorntonJournal of Molecular Biology, 2000
- Comparison of sequence profiles. Strategies for structural predictions using sequence informationProtein Science, 2000
- Dynamic sequence databank searching with templates and multiple alignment 1 1Edited by J. KarnJournal of Molecular Biology, 1998
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choiceNucleic Acids Research, 1994
- The rapid generation of mutation data matrices from protein sequencesBioinformatics, 1992
- Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical featuresBiopolymers, 1983