Multiple sequence alignment with hierarchical clustering
- 25 November 1988
- journal article
- research article
- Published by Oxford University Press (OUP) in Nucleic Acids Research
- Vol. 16 (22) , 10881-10890
- https://doi.org/10.1093/nar/16.22.10881
Abstract
An algorithm is presented for the multiple alignment of sequences, either proteins or nucleic acids, that is both accurate and easy to use on microcomputers. The approach is based on the conventional dynamic-programming method of pairwise alignment. Initially, a hierarchical clustering of the sequences is performed using the matrix of the pairwise alignment scores. The closest sequences are aligned creating groups of aligned sequences. Then close groups are aligned until all sequences are aligned in one group. The pairwise alignments included in the multiple alignment form a new matrix that is used to produce a hierarchical clustering. If it is different from the first one, iteration of the process can be performed. The method is illustrated by an example : a global alignment of 39 sequences of cytochrome c.Keywords
This publication has 15 references indexed in Scilit:
- A strategy for the rapid multiple alignment of protein sequencesJournal of Molecular Biology, 1987
- Progressive sequence alignment as a prerequisitetto correct phylogenetic treesJournal of Molecular Evolution, 1987
- Profile analysis: detection of distantly related proteins.Proceedings of the National Academy of Sciences, 1987
- Evaluation and improvements in the automatic alignment of protein sequencesProtein Engineering, Design and Selection, 1987
- Multiple sequence alignmentJournal of Molecular Biology, 1986
- A multiple sequence alignment programNucleic Acids Research, 1986
- MULTAN: a program to align multiple DNA sequencesNucleic Acids Research, 1986
- Simultaneous comparison of three protein sequences.Proceedings of the National Academy of Sciences, 1985
- Rapid and Sensitive Protein Similarity SearchesScience, 1985
- A general method applicable to the search for similarities in the amino acid sequence of two proteinsJournal of Molecular Biology, 1970