Web-a-where
Top Cited Papers
- 25 July 2004
- proceedings article
- Published by Association for Computing Machinery (ACM)
- p. 273-280
- https://doi.org/10.1145/1008992.1009040
Abstract
We describe Web-a-Where, a system for associating geography with Web pages. Web-a-Where locates mentions of places and determines the place each name refers to. In addition, it assigns to each page a geographic focus --- a locality that the page discusses as a whole. The tagging process is simple and fast, aimed to be applied to large collections of Web pages and to facilitate a variety of location-based applications and data analyses.Geotagging involves arbitrating two types of ambiguities: geo/non-geo and geo/geo. A geo/non-geo ambiguity occurs when a place name also has a non-geographic meaning, such as a person name (e.g., Berlin) or a common word (Turkey). Geo/geo ambiguity arises when distinct places have the same name, as in London, England vs. London, Ontario.An implementation of the tagger within the framework of the WebFountain data mining system is described, and evaluated on several corpora of real Web pages. Precision of up to 82% on individual geotags is achieved. We also evaluate the relative contribution of various heuristics the tagger employs, and evaluate the focus-finding algorithm using a corpus pretagged with localities, showing that as many as 91% of the foci reported are correct up to the country level.Keywords
This publication has 14 references indexed in Scilit:
- Geographic reference analysis for geographic document queryingPublished by Association for Computational Linguistics (ACL) ,2003
- A confidence-based framework for disambiguating geographic termsPublished by Association for Computational Linguistics (ACL) ,2003
- InfoXtract location normalizationPublished by Association for Computational Linguistics (ACL) ,2003
- Entity extraction without language-specific resourcesPublished by Association for Computational Linguistics (ACL) ,2002
- Statistical named entity recognizer adaptationPublished by Association for Computational Linguistics (ACL) ,2002
- SLINERCPublished by Association for Computational Linguistics (ACL) ,2002
- Boosting for named entity recognitionPublished by Association for Computational Linguistics (ACL) ,2002
- Language independent NER using a unified model of internal and contextual evidencePublished by Association for Computational Linguistics (ACL) ,2002
- Named entity recognition using an HMM-based chunk taggerPublished by Association for Computational Linguistics (ACL) ,2001
- Overview of results of the MUC-6 evaluationPublished by Association for Computational Linguistics (ACL) ,1995