MARTT: Using induced knowledge base to automatically mark up plant taxonomic descriptions with XML
Open Access
- 1 January 2005
- journal article
- metadata and-ontologies
- Published by Wiley in Proceedings of the American Society for Information Science and Technology
- Vol. 42 (1)
- https://doi.org/10.1002/meet.1450420170
Abstract
Despite the sub‐language nature of taxonomic descriptions of plants, researchers warned about the large variations among different collections of descriptions in terms of information contents and presentations. These variations impose a serious challenge to the development of automatic tools for the semantic markup of large volumes of free‐text descriptions. This paper presents a new approach to automatic markup of multiple collections of taxonomic descriptions with XML. The effectiveness of the approach was demonstrated with markup experiments using three contemporary floras. The markup system, MARTT, was based on supervised machine learning algorithms and enhanced by machine learned association rules representing certain types of domain knowledge and conventions. Experiments showed that our simple and efficient markup algorithm outperformed popular general‐purpose algorithms (including SVMs) across different floras. More importantly, the results demonstrated that the domain knowledge learned from one flora was useful for improving the markup performance on a second flora, especially on elements with sparse training examples. The system design and the evaluation of markup algorithms are reported in this paper. The study on the effectiveness of the induced knowledge base will be reported in a later paper. In this paper, common practices of flora authors and the potentials of MARTT system for improving the efficiency and effectiveness of the creation, organization, and utilization of plant descriptions are also discussed.Keywords
This publication has 4 references indexed in Scilit:
- Data patterns in multiple botanical descriptions: Implications for automatic processing of legacy dataSystematics and Biodiversity, 2003
- X-tract: structure extraction from botanical textual descriptionsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- An approach to automatic classification of text for information retrievalPublished by Association for Computing Machinery (ACM) ,2002
- A GENERAL SYSTEM FOR CODING TAXONOMIC DESCRIPTIONSTaxon, 1980