Kangaroo – A pattern-matching program for biological sequences
Open Access
- 31 July 2002
- journal article
- Published by Springer Nature in BMC Bioinformatics
- Vol. 3 (1) , 20
- https://doi.org/10.1186/1471-2105-3-20
Abstract
Biologists are often interested in performing a simple database search to identify proteins or genes that contain a well-defined sequence pattern. Many databases do not provide straightforward or readily available query tools to perform simple searches, such as identifying transcription binding sites, protein motifs, or repetitive DNA sequences. However, in many cases simple pattern-matching searches can reveal a wealth of information. We present in this paper a regular expression pattern-matching tool that was used to identify short repetitive DNA sequences in human coding regions for the purpose of identifying potential mutation sites in mismatch repair deficient cells. Kangaroo is a web-based regular expression pattern-matching program that can search for patterns in DNA, protein, or coding region sequences in ten different organisms. The program is implemented to facilitate a wide range of queries with no restriction on the length or complexity of the query expression. The program is accessible on the web at http://bioinfo.mshri.on.ca/kangaroo/ and the source code is freely distributed at http://sourceforge.net/projects/slritools/. A low-level simple pattern-matching application can prove to be a useful tool in many research settings. For example, Kangaroo was used to identify potential genetic targets in a human colorectal cancer variant that is characterized by a high frequency of mutations in coding regions containing mononucleotide repeats.Keywords
This publication has 9 references indexed in Scilit:
- Mutation profiling of mismatch repair-deficient colorectal cncers using an in silico genome scan to identify coding microsatellites.2002
- Statistics of local complexity in amino acid sequences and sequence databasesPublished by Elsevier ,2001
- Finding nuclear localization signalsEMBO Reports, 2000
- PatSearch: a pattern matcher software that finds functional elements in nucleotide and protein sequences and assesses their statistical significanceBioinformatics, 2000
- A National Cancer Institute Workshop on Microsatellite Instability for cancer detection and familial predisposition: development of international criteria for the determination of microsatellite instability in colorectal cancer.1998
- Searching for patterns in genomic dataTrends in Genetics, 1997
- A new generation of information retrieval tools for biologists: the example of the ExPASy WWW serverTrends in Biochemical Sciences, 1994
- WORDUP: an efficient algorithm for discovering statistically significant patterns in DNA sequencesNucleic Acids Research, 1992
- Basic local alignment search toolJournal of Molecular Biology, 1990