Mining the bibliome: searching for a needle in a haystack?
Open Access
- 1 March 2002
- journal article
- research article
- Published by Springer Nature in EMBO Reports
- Vol. 3 (3) , 200-203
- https://doi.org/10.1093/embo-reports/kvf059
Abstract
Writing in 1985 in a committee report for the US National Academy of Sciences, Harold J. Morowitz (George Mason University, VA) argued that biological research had reached a point where ‘new generalizations and higher order biological laws are being approached, but may be obscured by the simple mass of data’ (Morowitz, 1985). Now, 16 years later, his warning has proven to be not exaggerated. In 1985, the total number of sequence entries in the EBI nucleotide database was around 5000. In 2001, the number of entries added to the database per day was around five times this number. And the increasingly wider application of data‐intensive technologies, such as DNA and protein chips, high‐throughput protein three‐dimensional structure determination and real‐time molecular and cellular imaging, have confirmed fears, rational or otherwise, that biologists are likely to be swamped by a digital tsunami of data. But amongst the many prophesies of doom, relatively little attention has been paid to the consequences of the growing amount of scientific literature. One reason for this neglect may be the fact that this increase has been less dramatic than that of sequence and other databases. It is, nevertheless, still impressive, as evidenced by the latest release notes for the US National Library of Medicine's Medline bibliographic database (www.nlm.nih.gov/databases/databases/medline.html), which stores metadata for more than 11 million articles from some 4300 refereed journals. Another reason may be that electronic access, both to metadata and to full text, has made it considerably easier to search for and use scientific literature. Entrez‐PubMed, for instance, NCBI's simple web‐based search system, allows scientists to search Medline for bibliographic information, find related publications and, depending on the journal and date of original publication, retrieve the full text of the article, all without leaving their desk. To those of us who started our research …Keywords
This publication has 13 references indexed in Scilit:
- Creating the Gene Ontology Resource: Design and ImplementationGenome Research, 2001
- In silico veritasEMBO Reports, 2001
- A literature network of human genes for high-throughput analysis of gene expressionNature Genetics, 2001
- Automated extraction of information on protein–protein interactions from the biological literatureBioinformatics, 2001
- Gene Ontology: tool for the unification of biologyNature Genetics, 2000
- Facilitating networks of information.2000
- Detecting Gene Symbols and Names in Biological Texts: A First Step toward Pertinent Information Extraction.1998
- Automatic Analysis, Theme Generation, and Summarization of Machine-Readable TextsScience, 1994
- Global Text Matching for Information RetrievalScience, 1991
- Indexing by latent semantic analysisJournal of the American Society for Information Science, 1990