A comparison of several similarity indices used in the classification of protein sequences: a multivariate analysis
- 1 January 1992
- journal article
- research article
- Published by Oxford University Press (OUP) in Nucleic Acids Research
- Vol. 20 (14) , 3631-3637
- https://doi.org/10.1093/nar/20.14.3631
Abstract
The present work describes an attempt to identify reliable criteria which could be used as distance indices between protein sequences. Seven different criteria have been tested: i and ii) the scores of the alignments as given by the BESTFIT and the FASTA programs; iii) the ratio parameter, i.e. the BESTFIT score divided by the length of the aligned peptides; iv and v) the statistical significance (Z-scores) of the scores calculated by BESTFIT and FASTA, as obtained by comparison with shuffled sequences; vi) the Z-scores provided by the program RELATE which performs a segment-by-segment comparison of 2 sequences, and vii) an original distance index calculated by the program DOCMA from all the pairwise dotplots between the sequences. These 7 criteria have been tested against the aminoacid sequences of 39 globins and those of the 20 aminoacyl-tRNA synthetases from E. coli. The distances between the sequences were analyzed by the multivariate analysis techniques. The results show that the distances calculated from the scores of the pairwise alignments are not adequately sensitive. The Z-score from RELATE is not selective enough and too demanding in computer time. Three criteria gave a classification consistent with the known similarities between the sequences in the sets, namely the Z-scores from BESTFIT and FASTA and the multiple dotplot comparison distance index from DOCMA.Keywords
This publication has 28 references indexed in Scilit:
- CLUSTAL: a package for performing multiple sequence alignment on a microcomputerPublished by Elsevier ,2003
- Crystallographic study at 2·5 Å resolution of the interaction of methionyl-tRNA synthetase from Escherichia coli with ATPJournal of Molecular Biology, 1990
- Partition of tRNA synthetases into two classes based on mutually exclusive sets of sequence motifsNature, 1990
- Structure of E. coli Glutaminyl-tRNA Synthetase Complexed with tRNA Gln and ATP at 2.8 Å ResolutionScience, 1989
- Structure of tyrosyl-tRNA synthetase refined at 2.3 Å resolutionJournal of Molecular Biology, 1989
- Merging of distance matrices and classification by dynamic clusteringBioinformatics, 1988
- An evolutionary tree for invertebrate globin sequencesJournal of Molecular Evolution, 1988
- Detecting homology of distantly related proteins with consensus sequencesJournal of Molecular Biology, 1987
- Multiple sequence alignmentJournal of Molecular Biology, 1986
- A possible three-dimensional structure of bovine α-lactalbumin based on that of hen's egg-white lysozymeJournal of Molecular Biology, 1969