Language Trees and Zipping
Abstract
In this letter we present a very general method to recognize and classify informations codified as sequences of characters. Based on data-compression techniques, its key point is the computation of the relative entropy between pairs of sequences, interpreted as a distance between them. We present the implementation of the method to linguistic motivated problems, featuring highly accurate results for language recognition, author recognition and language classification.Keywords
All Related Versions
This publication has 0 references indexed in Scilit: