Abstract
Protein coding regions of a genome fragment can be mathematically predicted by studying variations in the statistical properties or by searching the signals characteristic of the junctions between the coding and non-coding regions. We propose here a new statistical method using correspondence analysis. This method does not use any reference codon set but takes into account the codon usage homogeneity along the studied genome fragment. Comparison with previously published methods especially the ‘codon usage method’ of Staden has been made, and two examples are presented here. Applications to analysis of prokaryotic operon and eukaryotic split genes are also discussed. Use of the method has also shown two structures not previously described: i) in the human prt gene, a strong triplet structure exists in a non-coding region; ii) in the human tp-a codon usage is not uniform between the different exons