In Silico Pattern-Based Analysis of the Human Cytomegalovirus Genome
Open Access
- 1 April 2003
- journal article
- research article
- Published by American Society for Microbiology in Journal of Virology
- Vol. 77 (7) , 4326-4344
- https://doi.org/10.1128/jvi.77.7.4326-4344.2003
Abstract
More than 200 open reading frames (ORFs) from the human cytomegalovirus genome have been reported as potentially coding for proteins. We have used two pattern-based in silico approaches to analyze this set of putative viral genes. With the help of an objective annotation method that is based on the Bio-Dictionary, a comprehensive collection of amino acid patterns that describes the currently known natural sequence space of proteins, we have reannotated all of the previously reported putative genes of the human cytomegalovirus. Also, with the help of MUSCA, a pattern-based multiple sequence alignment algorithm, we have reexamined the original human cytomegalovirus gene family definitions. Our analysis of the genome shows that many of the coded proteins comprise amino acid combinations that are unique to either the human cytomegalovirus or the larger group of herpesviruses. We have confirmed that a surprisingly large portion of the analyzed ORFs encode membrane proteins, and we have discovered a significant number of previously uncharacterized proteins that are predicted to be G-protein-coupled receptor homologues. The analysis also indicates that many of the encoded proteins undergo posttranslational modifications such as hydroxylation, phosphorylation, and glycosylation. ORFs encoding proteins with similar functional behavior appear in neighboring regions of the human cytomegalovirus genome. All of the results of the present study can be found and interactively explored online (http://cbcsrv.watson.ibm.com/virus/).Keywords
This publication has 47 references indexed in Scilit:
- Identification of common molecular subsequencesPublished by Elsevier ,2004
- Oxygen Sensing Gets a Second WindScience, 2002
- In silico structural and functional analysis of the human cytomegalovirus (HHV5) genome 1 1Edited by F. CohenJournal of Molecular Biology, 2001
- Virally encoded 7TM receptorsOncogene, 2001
- The Emergence of Pattern Discovery Techniques in Computational BiologyMetabolic Engineering, 2000
- Combinatorial pattern discovery in biological sequences: The TEIRESIAS algorithm.Bioinformatics, 1998
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- A new approach to protein fold recognitionNature, 1992
- Basic local alignment search toolJournal of Molecular Biology, 1990
- Nucleotide sequence of the most abundantly transcribed early gene of human cytomegalovirus strain AD169Virus Research, 1987