Classification of G‐protein coupled receptors by alignment‐independent extraction of principal chemical properties of primary amino acid sequences
- 13 April 2002
- journal article
- Published by Wiley in Protein Science
- Vol. 11 (4) , 795-805
- https://doi.org/10.1110/ps.2500102
Abstract
We have developed an alignment-independent method for classification of G-protein coupled receptors (GPCRs) according to the principal chemical properties of their amino acid sequences. The method relies on a multivariate approach where the primary amino acid sequences are translated into vectors based on the principal physicochemical properties of the amino acids and transformation of the data into a uniform matrix by applying a modified autocross-covariance transform. The application of principal component analysis to a data set of 929 class A GPCRs showed a clear separation of the major classes of GPCRs. The application of partial least squares projection to latent structures created a highly valid model (cross-validated correlation coefficient, Q(2) = 0.895) that gave unambiguous classification of the GPCRs in the training set according to their ligand binding class. The model was further validated by external prediction of 535 novel GPCRs not included in the training set. Of the latter, only 14 sequences, confined in rapidly expanding GPCR classes, were mispredicted. Moreover, 90 orphan GPCRs out of 165 were tentatively identified to GPCR ligand binding class. The alignment-independent method could be used to assess the importance of the principal chemical properties of every single amino acid in the protein sequences for their contributions in explaining GPCR family membership. It was then revealed that all amino acids in the unaligned sequences contributed to the classifications, albeit to varying extent; the most important amino acids being those that could also be determined to be conserved by using traditional alignment-based methods.Keywords
This publication has 18 references indexed in Scilit:
- DNA and peptide sequences and chemical processes multivariately modelled by principal component analysis and partial least-squares projections to latent structuresPublished by Elsevier ,2001
- Development of proteo-chemometrics: a novel technology for the analysis of drug-receptor interactionsBiochimica et Biophysica Acta (BBA) - General Subjects, 2001
- New Chemical Descriptors Relevant for the Design of Biologically Active Peptides. A Multivariate Characterization of 87 Amino AcidsJournal of Medicinal Chemistry, 1998
- An alpha-carbon template for the transmembrane helices in the rhodopsin family of G-protein-coupled receptors 1 1Edited by R. HuberJournal of Molecular Biology, 1997
- A G protein-coupled receptor with low density lipoprotein-binding motifs suggests a role for lipoproteins in G-linked signal transduction.Proceedings of the National Academy of Sciences, 1994
- Amino acid substitution during functionally constrained divergent evolution of protein sequencesProtein Engineering, Design and Selection, 1994
- Generating Optimal Linear PLS Estimations (GOLPE): An Advanced Chemometric Tool for Handling 3D‐QSAR ProblemsQuantitative Structure-Activity Relationships, 1993
- Principal component analysisChemometrics and Intelligent Laboratory Systems, 1987
- Peptide quantitative structure-activity relationships, a multivariate approachJournal of Medicinal Chemistry, 1987
- Partial least-squares regression: a tutorialAnalytica Chimica Acta, 1986