What is Bioinformatics? A Proposed Definition and Overview of the Field
- 1 January 2001
- journal article
- review article
- Published by Georg Thieme Verlag KG in Methods of Information in Medicine
- Vol. 40 (04) , 346-358
- https://doi.org/10.1055/s-0038-1634431
Abstract
Summary: Background: The recent flood of data from genome sequences and functional genomics has given rise to new field, bioinformatics, which combines elements of biology and computer science. Objectives: Here we propose a definition for this new field and review some of the research that is being pursued, particularly in relation to transcriptional regulatory systems. Methods: Our definition is as follows: Bioinformatics is conceptualizing biology in terms of macromolecules (in the sense of physical-chemistry) and then applying “informatics” techniques (derived from disciplines such as applied maths, computer science, and statistics) to understand and organize the information associated with these molecules, on a large-scale. Results and Conclusions: Analyses in bioinformatics predominantly focus on three types of large datasets available in molecular biology: macromolecular structures, genome sequences, and the results of functional genomics experiments (eg expression data). Additional information includes the text of scientific papers and “relationship data” from metabolic pathways, taxonomy trees, and protein-protein interaction networks. Bioinformatics employs a wide range of computational techniques including sequence and structural alignment, database design and data mining, macromolecular geometry, phylogenetic tree construction, prediction of protein structure and function, gene finding, and expression data clustering. The emphasis is on approaches integrating a variety of computational methods and heterogeneous data sources. Finally, bioinformatics is a practical discipline. We survey some representative applications, such as finding homologues, designing drugs, and performing large-scale censuses. Additional information pertinent to the review is available over the web at http://bioinfo.mbb.yale.edu/what-is-it.Keywords
This publication has 80 references indexed in Scilit:
- A Bayesian system integrating expression data with sequence patterns for localizing proteins: comprehensive application to the yeast genome 1 1Edited by F. CohenJournal of Molecular Biology, 2000
- Systematic variation in gene expression patterns in human cancer cell linesNature Genetics, 2000
- Biosequence ExegesisScience, 1999
- A comprehensive library of DNA-binding site matrices for 55 proteins applied to the complete Escherichia coli K-12 genomeJournal of Molecular Biology, 1998
- Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies 1 1Edited by G. von HeijneJournal of Molecular Biology, 1998
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- Prediction of Function in DNA Sequence AnalysisJournal of Computational Biology, 1995
- Comparative Protein Modelling by Satisfaction of Spatial RestraintsJournal of Molecular Biology, 1993
- A new approach to protein fold recognitionNature, 1992
- How different amino acid sequences determine similar protein structures: The structure and evolutionary dynamics of the globinsJournal of Molecular Biology, 1980