Self‐organized neural maps of human protein sequences
- 1 March 1994
- journal article
- research article
- Published by Wiley in Protein Science
- Vol. 3 (3) , 507-521
- https://doi.org/10.1002/pro.5560030316
Abstract
We have recently described a method based on artificial neural networks to cluster protein sequences into families. The network was trained with Kohonen's unsupervised learning algorithm using, as inputs, the matrix patterns derived from the dipeptide composition of the proteins. We present here a large‐scale application of that method to classify the 1,758 human protein sequences stored in the SwissProt database (release 19.0), whose lengths are greater than 50 amino acids. In the final 2‐dimensional topologically ordered map of 15 × 15 neurons, proteins belonging to known families were associated with the same neuron or with neighboring ones. Also, as an attempt to reduce the time‐consuming learning procedure, we compared 2 learning protocols: one of 500 epochs (100 SUN CPU‐hours [CPU‐h]), and another one of 30 epochs (6.7 CPU‐h). A further reduction of learning‐computing time, by a factor of about 3.3, with similar protein clustering results, was achieved using a matrix of 11×11 components to represent the sequences. Although network training is time consuming, the classification of a new protein in the final ordered map is very fast (14.6 CPU‐seconds). We also show a comparison between the artificial neural network approach and conventional methods of biosequence analysis.Keywords
This publication has 62 references indexed in Scilit:
- Prediction of Protein Secondary Structure at Better than 70% AccuracyJournal of Molecular Biology, 1993
- Recognition of distantly related protein sequences using conserved motifs and neural networksJournal of Molecular Biology, 1992
- Predicting protein secondary structure content: A tandem neural network approachJournal of Molecular Biology, 1992
- Predicting protein secondary structure using neural net and statistical methodsJournal of Molecular Biology, 1992
- A new family of powerful multivariate statistical sequence analysis techniquesJournal of Molecular Biology, 1991
- An Application of Unsupervised Neural Network Methodology Kohonen Topology‐Preserving Mapping to QSAR AnalysisQuantitative Structure-Activity Relationships, 1991
- Basic local alignment search toolJournal of Molecular Biology, 1990
- Improvements in protein secondary structure prediction by an enhanced neural networkJournal of Molecular Biology, 1990
- Predicting the secondary structure of globular proteins using neural network modelsJournal of Molecular Biology, 1988
- Self-organized formation of topologically correct feature mapsBiological Cybernetics, 1982