FLUX-CIM
- 18 June 2007
- proceedings article
- Published by Association for Computing Machinery (ACM)
- p. 215-224
- https://doi.org/10.1145/1255175.1255219
Abstract
In this paper we propose a knowledge-base approach to help extracting the correct components of citations in any given format. Differently from related approaches that rely on manually built knowledge-bases (KBs) for recognizing the components of a citation, in our case, such a KB is automatically constructed from an existing set of sample metadata records from a given area (e.g., computer science or health sciences). Our approach does not rely on patterns encoding specific delimitators of a particular citation style. It is also unsupervised, in the sense that it does not rely on a learning method that requires a training phase. These features assign to our technique a high degree of automation and flexibility. To demonstrate the effectiveness and applicability of our proposed approach we have run experiments in which we applied it to extract information from citations in papers of two different domains. Results of these experiments indicate precision and recall levels above 94% and perfect extraction for the large majority of citations tested.Keywords
This publication has 17 references indexed in Scilit:
- Are your citations clean?Communications of the ACM, 2007
- LABRADOR: Efficiently publishing relational databases on the web by using keyword-based query interfacesInformation Processing & Management, 2007
- Developing practical automatic metadata assignment and evaluation tools for internet resourcesPublished by Association for Computing Machinery (ACM) ,2005
- Extracting structured data from Web pagesPublished by Association for Computing Machinery (ACM) ,2003
- A brief survey of web data extraction toolsACM SIGMOD Record, 2002
- DEByE – Data Extraction By ExampleData & Knowledge Engineering, 2002
- Wrapper induction: Efficiency and expressivenessArtificial Intelligence, 2000
- Conceptual-model-based data extraction from multiple-record Web pagesData & Knowledge Engineering, 1999
- Digital libraries and autonomous citation indexingComputer, 1999
- The anatomy of a large-scale hypertextual Web search engineComputer Networks and ISDN Systems, 1998