Bioinformatic tools for DNA/protein sequence analysis, functional assignment of genes and protein classification
- 1 December 2001
- journal article
- review article
- Published by Springer Nature in Applied Microbiology and Biotechnology
- Vol. 57 (5-6) , 579-592
- https://doi.org/10.1007/s00253-001-0844-0
Abstract
The development of efficient DNA sequencing methods has led to the achievement of the DNA sequence of entire genomes from (to date) 55 prokaryotes, 5 eukaryotic organisms and 10 eukaryotic chromosomes. Thus, an enormous amount of DNA sequence data is available and even more will be forthcoming in the near future. Analysis of this overwhelming amount of data requires bioinformatic tools in order to identify genes that encode functional proteins or RNA. This is an important task, considering that even in the well-studied Escherichia coli more than 30% of the identified open reading frames are hypothetical genes. Future challenges of genome sequence analysis will include the understanding of gene regulation and metabolic pathway reconstruction including DNA chip technology, which holds tremendous potential for biomedicine and the biotechnological production of valuable compounds. The overwhelming volume of information often confuses scientists.This review intends to provide a guide to choosing the most efficient way to analyze a new sequence or to collect information on a gene or protein of interest by applying current publicly available databases and Web services. Recently developed tools that allow functional assignment of genes, mainly based on sequence similarity of the deduced amino acid sequence, using the currently available and increasing biological databases will be discussed.Keywords
This publication has 6 references indexed in Scilit:
- NIFAS: visual analysis of domain evolution in proteinsBioinformatics, 2001
- MetaFam: a unified classification of protein families. I. Overview and statisticsBioinformatics, 2001
- Phylogenetic Inferences from Molecular Sequences: Review and CritiqueTheoretical Population Biology, 2001
- The InterPro database, an integrated documentation resource for protein families, domains and functional sitesNucleic Acids Research, 2001
- SMART: a web-based tool for the study of genetically mobile domainsNucleic Acids Research, 2000
- Protein fold recognition by prediction-based threadingJournal of Molecular Biology, 1997