AutoDict: Automated Dictionary Discovery
- 1 April 2012
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- Vol. 30 (10636382) , 1277-1280
- https://doi.org/10.1109/icde.2012.126
Abstract
An attribute dictionary is a set of attributes together with a set of common values of each attribute. Such dictionaries are valuable in understanding unstructured or loosely structured textual descriptions of entity collections, such as product catalogs. Dictionaries provide the supervised data for learning product or entity descriptions. In this demonstration, we will present AutoDict, a system that analyzes input data records, and discovers high quality dictionaries using information theoretic techniques. To the best of our knowledge, AutoDict is the first end-to-end system for building attribute dictionaries. Our demonstration will showcase the different information analysis and extraction features within AutoDict, and highlight the process of generating high quality attribute dictionaries.Keywords
This publication has 8 references indexed in Scilit:
- Structured annotations of web queriesPublished by Association for Computing Machinery (ACM) ,2010
- ONDUXPublished by Association for Computing Machinery (ACM) ,2010
- Unsupervised query segmentation using generative language models and wikipediaPublished by Association for Computing Machinery (ACM) ,2008
- Elements of Information TheoryPublished by Wiley ,2005
- Information-theoretic tools for mining database structure from large data setsPublished by Association for Computing Machinery (ACM) ,2004
- Automatic segmentation of text into structured recordsACM SIGMOD Record, 2001
- Dynamic itemset counting and implication rules for market basket dataPublished by Association for Computing Machinery (ACM) ,1997
- Modeling by shortest data descriptionAutomatica, 1978