Evolutionary profiles from the QR factorization of multiple sequence alignments
- 15 March 2005
- journal article
- Published by Proceedings of the National Academy of Sciences in Proceedings of the National Academy of Sciences
- Vol. 102 (11) , 4045-4050
- https://doi.org/10.1073/pnas.0409715102
Abstract
We present an algorithm to generate complete evolutionary profiles that represent the topology of the molecular phylogenetic tree of the homologous group. The method, based on the multidimensional QR factorization of numerically encoded multiple sequence alignments, removes redundancy from the alignments and orders the protein sequences by increasing linear dependence, resulting in the identification of a minimal basis set of sequences that spans the evolutionary space of the homologous group of proteins. We observe a general trend that these smaller, more evolutionarily balanced profiles have comparable and, in many cases, better performance in database searches than conventional profiles containing hundreds of sequences, constructed in an iterative and computationally intensive procedure. For more diverse families or superfamilies, with sequence identity <30%, structural alignments, based purely on the geometry of the protein structures, provide better alignments than pure sequence-based methods. Merging the structure and sequence information allows the construction of accurate profiles for distantly related groups. These structure-based profiles outperformed other sequence-based methods for finding distant homologs and were used to identify a putative class II cysteinyl-tRNA synthetase (CysRS) in several archaea that eluded previous annotation studies. Phylogenetic analysis showed the putative class II CysRSs to be a monophyletic group and homology modeling revealed a constellation of active site residues similar to that in the known class I CysRS.Keywords
This publication has 44 references indexed in Scilit:
- The Plant-Associated Microbe Gene Ontology (PAMGO) Consortium: community development of new Gene Ontology terms describing biological processes involved in microbe-host interactionsBMC Microbiology, 2009
- Identification and Functional Characterization of Gene Components of Type VI Secretion System in Bacterial GenomesPLOS ONE, 2008
- Quorum Sensing Coordinates Brute Force and Stealth Modes of Infection in the Plant Pathogen Pectobacterium atrosepticumPLoS Pathogens, 2008
- Secretome Analysis Uncovers an Hcp-Family Protein Secreted via a Type VI Secretion System inAgrobacterium tumefaciensJournal of Bacteriology, 2008
- Type VI secretion system translocates a phage tail spike-like protein into target cells where it cross-links actinProceedings of the National Academy of Sciences, 2007
- SCOP: A structural classification of proteins database for the investigation of sequences and structuresPublished by Elsevier ,2006
- The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003Nucleic Acids Research, 2003
- Functional convergence of two lysyl-tRNA synthetases with unrelated topologiesNature Structural & Molecular Biology, 2002
- Transducer Placement for Broadband Active Vibration Control Using a Novel Multidimensional QR FactorizationJournal of Vibration and Acoustics, 1998
- CATH – a hierarchic classification of protein domain structuresPublished by Elsevier ,1997