Proteome-Wide Prediction of Novel DNA/RNA-Binding Proteins Using Amino Acid Composition and Periodicity in the Hyperthermophilic Archaeon Pyrococcus furiosus
Open Access
- 15 June 2007
- journal article
- research article
- Published by Oxford University Press (OUP) in DNA Research
- Vol. 14 (3) , 91-102
- https://doi.org/10.1093/dnares/dsm011
Abstract
Proteins play a critical role in complex biological systems, yet about half of the proteins in publicly available databases are annotated as functionally unknown. Proteome-wide functional classification using bioinformatics approaches thus is becoming an important method for revealing unknown protein functions. Using the hyperthermophilic archaeon Pyrococcus furiosus as a model species, we used the support vector machine (SVM) method to discriminate DNA/RNA-binding proteins from proteins with other functions, using amino acid composition and periodicities as feature vectors. We defined this value as the composition score (CO) and periodicity score (PD). The P. furiosus proteins were classified into three classes (I–III) on the basis of the two-dimensional correlation analysis of CO score and PD score. As a result, approximately 87% of the functionally known proteins categorized as class I proteins (CO score + PD score > 0.6) were found to be DNA/RNA-binding proteins. Applying the two-dimensional correlation analysis to the 994 hypothetical proteins in P. furiosus, a total of 151 proteins were predicted to be novel DNA/RNA-binding protein candidates. DNA/RNA-binding activities of randomly chosen hypothetical proteins were experimentally verified. Six out of seven candidate proteins in class I possessed DNA/RNA-binding activities, supporting the efficacy of our method.Keywords
This publication has 41 references indexed in Scilit:
- Archaeal Pyrococcus furiosus thymidylate synthase 1 is an RNA-binding proteinBiochemical Journal, 2005
- Predicting rRNA-, RNA-, and DNA-binding proteins from primary structure with support vector machinesPublished by Elsevier ,2005
- Latent periodicity of serine–threonine and tyrosine protein kinases and other protein familiesComputational Biology and Chemistry, 2005
- NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteinsNucleic Acids Research, 2004
- Structural Biology Sheds Light on the Puzzle of Genomic ORFansJournal of Molecular Biology, 2004
- Electrostatic Interactions in a Peptide–RNA ComplexJournal of Molecular Biology, 2003
- Expression cloning and characterization of a novel gene that encodes the RNA-binding protein FAU-1 from Pyrococcus furiosusBiochemical Journal, 2003
- RNA-binding strategies common to cold-shock domain- and RNA recognition motif-containing proteinsNucleic Acids Research, 2001
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- Cloning of a crustacean myosin heavy chain isoform: Exclusive expression in fast muscleJournal of Experimental Zoology, 1993