CIRC II Data Base Classification.
- 1 June 1977
- report
- Published by Defense Technical Information Center (DTIC)
Abstract
This report describes the development of a classification system for the CIRC II Data Base. 98 CIRC II classes are designed which partition the documents of this data base. The software which assigns these classes to incoming documents utilizes a sequential classification algorithm. In this approach, only as much of each document is read to accurately assign one or more classes, together with a confidence probability for each assigned class. In this way, a compromise is obtained between efficiency and accuracy. A number of parameters are available in this software to effect this trade off. Additional software has been developed to analyze sample documents to define the CIRC II classes, producing keywords and frequency distributions over the classes. This software provides flexibility for the classification system, as a class can be added or deleted, a class modified by submitting additional documents, or the keyword selection criterion can be altered. A number of experiments were conducted using this classification system on CIRC II documents. It was shown that satisfactory classification could be achieved, and a stable set of keywords and frequency distributions obtained. (Author)Keywords
This publication has 0 references indexed in Scilit: