Comparison of sequences as a method for evaluation of the molecular similarity
- 1 April 1986
- journal article
- research article
- Published by Wiley in Journal of Computational Chemistry
- Vol. 7 (2) , 176-188
- https://doi.org/10.1002/jcc.540070211
Abstract
String comparison techniques were developed and applied for measuring the molecular similarity of chemical structures. The molecular structures were encoded as a sequence of numbers representing counts of paths of different lengths. The similarity index between two compounds was calculated as the difference between the gains of information derived through comparison of the corresponding molecular path sequences. Ranks between the structures of the studied data base obtained according to this similarity were used as basic data for deriving correspondences between the elements of the set of compounds. The method was applied on a group of 41 barbiturates. Correlation equations were calculated for different groups of compounds grouped according to the displayed similarity. The correlation equations and the corresponding statistics were obtained using standard computer programs. Special algorithm for computing the similarity index and the correlation matrix (outlined very briefly) was developed and implemented on VAX 11/750.Keywords
This publication has 30 references indexed in Scilit:
- Pattern recognition in nucleic acid sequences. II. An efficient method for finding locally stable secondary structuresNucleic Acids Research, 1982
- New Stratigraphic Correlation TechniquesThe Journal of Geology, 1980
- Botulism: A Pyrolysis-Gas-Liquid Chromatographic StudyJournal of Chromatographic Science, 1978
- UNIX Time-Sharing System: Document PreparationBell System Technical Journal, 1978
- A Comparison of the Performance of Some Similarity and Dissimilarity Measures in the Automatic Classification of Chemical StructuresJournal of Chemical Information and Computer Sciences, 1975
- A practical application of a real-time isolated-word recognition system using syntactic constraintsIEEE Transactions on Acoustics, Speech, and Signal Processing, 1974
- An application of information theory to genetic mutations and the matching of polypeptide sequencesJournal of Theoretical Biology, 1973
- A general method applicable to the search for similarities in the amino acid sequence of two proteinsJournal of Molecular Biology, 1970
- Spelling correction in systems programsCommunications of the ACM, 1970
- Relations between chemical structure and biological activity in peptidesJournal of Theoretical Biology, 1966