Significance of similarities in patterns: an application to beta interferon-related DNA on human chromosome 2.
- 1 June 1985
- journal article
- research article
- Published by Proceedings of the National Academy of Sciences in Proceedings of the National Academy of Sciences
- Vol. 82 (12) , 4090-4094
- https://doi.org/10.1073/pnas.82.12.4090
Abstract
The nucleotide sequence of a 14-kilobase (kb) region of the human beta interferon (IFN-beta)-related DNA locus on chromosome 2 (genomic DNA clone lambda B3) was determined and compared to that of the IFN-beta 1 gene by using the Sellers TT algorithm. This algorithm aligns segments of one sequence with similar segments in a second sequence. A strategy was developed for assessing the significance of similarities between DNA sequences based on a scheme that recognizes patterns or runs of identities within an alignment. The pattern score (II) thus obtained is an entropy-like measure. Numerically it is a reflection of the length of the second longest run of identity in an alignment plus a correction factor due to the other shorter identity runs in the alignment. When the IFN-beta 1 gene is compared to a random nucleotide sequence, the distribution of II scores in such comparisons fits a Gaussian function. This strategy has been used to identify seven segments along one strand of lambda B3 DNA that are related to segments in IFN-beta 1; these seven alignments have II scores greater than or equal to 3 standard deviations above the mean score obtained in comparisons between IFN-beta 1 and random nucleotide sequences. One of these alignments (section 7) has a II score 8.02 standard deviations above this mean score. The likelihood of finding an alignment statement as good as that in section 7 in a random sequence the length of the human genome is approximately 10(-7). Furthermore, the lambda B3 DNA sequence in section 7 selects the human IFN-beta 1 gene as the most significant alignment in computer searches of mammalian nucleotide sequence data bases.This publication has 20 references indexed in Scilit:
- Coagulation factors V and VIII and ceruloplasmin constitute a family of structurally related proteins.Proceedings of the National Academy of Sciences, 1984
- Interferon-β-Related DNA Is Dispersed in the Human GenomeScience, 1984
- On the statistical assessment of similarities in DNA sequencesNucleic Acids Research, 1984
- Isolation of novel human genomic DNA clones related to human interferon-beta 1 cDNA.Proceedings of the National Academy of Sciences, 1983
- Optimal sequence alignmentsProceedings of the National Academy of Sciences, 1983
- A new pair of M13 vectors for selecting either DNA strand of double-digest restriction fragmentsGene, 1982
- The Alu Family of Dispersed Repetitive SequencesScience, 1982
- Similar Amino Acid Sequences: Chance or Common Ancestry?Science, 1981
- Cloned human and mouse kappa immunoglobulin constant and J region genes conserve homology in functional segmentsCell, 1980
- A general method applicable to the search for similarities in the amino acid sequence of two proteinsJournal of Molecular Biology, 1970