The application of markov chain analysis to oligonucleotide frequency prediction and physical mappingofDrosophila melanogaster
- 25 July 1992
- journal article
- research article
- Published by Oxford University Press (OUP) in Nucleic Acids Research
- Vol. 20 (14) , 3651-3657
- https://doi.org/10.1093/nar/20.14.3651
Abstract
Here we compare several methods for predicting oligonucleotide frequencies in 691 kb of Drosophila melanogaster DNA. As in previous work on Escherichia coll and Saccharomyces cerevlslae, a relatively simple equation based on tetranucleotide frequencies can be used in predicting frequencies of higher order oligonucleotides. For example, the mean of observed/expected abundances of 4,096 hexamers was 1.07 with a sample standard deviation of .55. This simple predictor arises by considering each base on the sense strand of D.melanogaster to depend only on the three bases 5′ to it (a 3rd order Markov chain) and is more accurate than the random predictor. This equation Is useful in predicting restriction enzyme fragment sizes, selecting restriction enzymes that cut preferentially in coding vs noncoding regions, and in selecting probes to fingerprint clones in contig mapping. Once again, this equation well predicts the occurrence of higher order oligonucleotides, supporting our hypothesis that this predictor holds In evolutlonarily diverse organisms. When ranked from highest to lowest abundance, the observed frequencies of oligomers of a given length are closely tracked by the predicted abundances of a 3rd order Markov chain. Through use of the dependence of oligomer frequencies on base composition, we report a list of oligomers that will be useful for the completion of a cosmid physical map of D.melanogaster . Presently, the library is such that it will be possible to construct large contigs using only 30 oligonucleotide probes to fingerprint cosmids.Keywords
This publication has 18 references indexed in Scilit:
- Toward Cloning And Mapping the Genome of DrosophilaScience, 1991
- Use of high coverage reference libraries of Drosophila melanogaster for relational data analysis: A step towards mapping and sequencing of the genomeJournal of Molecular Biology, 1991
- The physical map of the whole E. coli chromosome: Application of a new strategy for rapid analysis and sorting of a large genomic libraryCell, 1987
- A Physical Map of the Escherichia coli K12 GenomeScience, 1987
- Cloning of Large Segments of Exogenous DNA into Yeast by Means of Artificial Chromosome VectorsScience, 1987
- Mapping using gene encyclopaediasNature, 1987
- Structures of P transposable elements and their sites of insertion and excision in the Drosophila melanogaster genomeCell, 1983
- DROSOPHILA GENOME ORGANIZATION: CONSERVED AND DYNAMIC ASPECTSAnnual Review of Genetics, 1981
- Inverted repeat sequences in the drosophila genomeCell, 1975
- Interspersion of repetitive and nonrepetitive DNA sequences in the Drosophila melanogaster genomeCell, 1975