Issues in searching molecular sequence databases
- 1 February 1994
- journal article
- review article
- Published by Springer Nature in Nature Genetics
- Vol. 6 (2) , 119-129
- https://doi.org/10.1038/ng0294-119
Abstract
Sequence similarity search programs are versatile tools for the molecular biologist, frequently able to identify possible DNA coding regions and to provide clues to gene and protein structure and function. While much attention had been paid to the precise algorithms these programs employ and to their relative speeds, there is a constellation of associated issues that are equally important to realize the full potential of these methods. Here, we consider a number of these issues, including the choice of scoring systems, the statistical significance of alignments, the masking of uninformative or potentially confounding sequence regions, the nature and extent of sequence redundancy in the databases and network access to similarity search services.Keywords
This publication has 78 references indexed in Scilit:
- Statistics of local complexity in amino acid sequences and sequence databasesPublished by Elsevier ,2001
- Detecting Frame Shifts by Amino Acid Sequence ComparisonJournal of Molecular Biology, 1993
- dbEST — database for “expressed sequence tags”Nature Genetics, 1993
- Identification of protein coding regions by database similarity searchNature Genetics, 1993
- Human genes containing polymorphic trinucleotide repeatsNature Genetics, 1992
- Analysis of insertions/deletions in protein structuresJournal of Molecular Biology, 1992
- Basic Local Alignment Search ToolJournal of Molecular Biology, 1990
- Basic local alignment search toolJournal of Molecular Biology, 1990
- A sensitive procedure to compare amino acid sequencesJournal of Molecular Biology, 1987
- Aligning amino acid sequences: Comparison of commonly used methodsJournal of Molecular Evolution, 1985