The Importance of Prior Probabilities for Entry Page Search
- 11 August 2002
- conference paper
- Published by Association for Computing Machinery (ACM)
Abstract
An important class of searches on the world-wide-web has the goal to find an entry page (homepage) of an organisation. Entry page search is quite different from Ad Hoc search. Indeed a plain Ad Hoc system performs disappointingly. We explored three non-content features of web pages: page length, number of incoming links and URL form. Especially the URL form proved to be a good predictor. Using URL form priors we found over 70% of all entry pages at rank 1, and up to 89% in the top 10. Non-content features can easily be embedded in a language model framework as a prior probability.Keywords
This publication has 12 references indexed in Scilit:
- Engineering a multi-purpose test collection for Web retrieval experimentsInformation Processing & Management, 2003
- The ninth text REtrieval conference (TREC-9)Published by National Institute of Standards and Technology (NIST) ,2001
- Topical locality in the WebPublished by Association for Computing Machinery (ACM) ,2000
- The eighth text REtrieval conference (TREC-8)Published by National Institute of Standards and Technology (NIST) ,2000
- Authoritative sources in a hyperlinked environmentJournal of the ACM, 1999
- The seventh text REtrieval conference (TREC-7)Published by National Institute of Standards and Technology (NIST) ,1999
- The anatomy of a large-scale hypertextual Web search engineComputer Networks and ISDN Systems, 1998
- Exploring the similarity spaceACM SIGIR Forum, 1998
- Term-weighting approaches in automatic text retrievalInformation Processing & Management, 1988
- Relevance weighting of search termsJournal of the American Society for Information Science, 1976