Prediction of the phenotypic effects of non-synonymous single nucleotide polymorphisms using structural and evolutionary information
Open Access
- 3 March 2005
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 21 (10) , 2185-2190
- https://doi.org/10.1093/bioinformatics/bti365
Abstract
Motivation: There has been great expectation that the knowledge of an individual's genotype will provide a basis for assessing susceptibility to diseases and designing individualized therapy. Non-synonymous single nucleotide polymorphisms (nsSNPs) that lead to an amino acid change in the protein product are of particular interest because they account for nearly half of the known genetic variations related to human inherited diseases. To facilitate the identification of disease-associated nsSNPs from a large number of neutral nsSNPs, it is important to develop computational tools to predict the phenotypic effects of nsSNPs. Results: We prepared a training set based on the variant phenotypic annotation of the Swiss-Prot database and focused our analysis on nsSNPs having homologous 3D structures. Structural environment parameters derived from the 3D homologous structure as well as evolutionary information derived from the multiple sequence alignment were used as predictors. Two machine learning methods, support vector machine and random forest, were trained and evaluated. We compared the performance of our method with that of the SIFT algorithm, which is one of the best predictive methods to date. An unbiased evaluation study shows that for nsSNPs with sufficient evolutionary information (with not Availability: The codes and curated dataset are available at http://compbio.utmem.edu/snp/dataset/ Contact:ycui2@utmem.edu Supplementary information: The curated dataset is available at http://compbio.utmem.edu/snp/dataset/Keywords
This publication has 28 references indexed in Scilit:
- ESLpred: SVM-based method for subcellular localization of eukaryotic proteins using dipeptide composition and PSI-BLASTNucleic Acids Research, 2004
- Prediction of the bonding states of cysteines Using the support vector machines based on multiple feature vectors and cysteine state sequencesProteins-Structure Function and Bioinformatics, 2004
- The ASTRAL Compendium in 2004Nucleic Acids Research, 2004
- UniProt: the Universal Protein knowledgebaseNucleic Acids Research, 2004
- HGVbase: a human sequence variation database emphasizing data quality and a broad spectrum of data sourcesNucleic Acids Research, 2002
- Predicting the functional consequences of non-synonymous single nucleotide polymorphisms: structure-based assessment of amino acid variation11Edited by F. CohenJournal of Molecular Biology, 2001
- A DNA Polymorphism Discovery Resource for Research on Human Genetic Variation: Table 1.Genome Research, 1998
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- A Method to Identify Protein Sequences That Fold into a Known Three-Dimensional StructureScience, 1991
- Basic Local Alignment Search ToolJournal of Molecular Biology, 1990