Information in Natural Languages

Abstract
Existing units of "information" fail to have any close correspondence with what we normally mean by the word information. We propose a new measure, the "kernel" which seems to possess five properties which are desirable for any proposed unit of information. We have developed an automated method for extracting kernels of information from the text of surgical operative reports. We can retrieve the information by asking questions in ordinary English. The first step is a dictionary look-up of the form class (similar to "parts of speech") and the seme number (identifying synonymous words) of each work in the operative report. A syntactic analysis and a transformational analysis is then performed to extract the kernels of information. To retrieve information, the questions are analyzed in the same way. The kernels thus obtained are matched to the kernels that have previously been catalogued. This retrieves the specific information that answers the question.