Micro-Mining and Segmented Log File Analysis: A Method for Enriching the Data Yield from Internet Log Files

Abstract
The authors propose improved ways of analysing web server log files. Traditionally web site statistics focus on giving a big (and shallow) picture analysis based on all transaction log entries. The pictures are, however, distorted because of the problems associated with resolving Internet protocol (IP) numbers to a single user and cross-border IP registration. The authors argue that analysing extracted sub-groups and categories presents a more accurate picture of the data and that the analysis of the online behaviour of selected individuals (rather than of very large groups) can add much to our understanding of how people use web sites and, indeed, any digital information source. The analysis is labelled `micro' to distinguish it from traditional macro, big picture transactional log analysis. The methods are illustrated with recourse to the logs of the Surgery Door (www.surgerydoor.co.uk) consumer health web site. It was found that use attributed to academic users gave a better approximation of the sites' geographical distribution of users than an analysis based on all users. This occurs as academic institutions, unlike other user types, register in their host country. Selecting log entries where each user is allocated a unique IP number can be particularly beneficial, especially to analyses of returnees. Finally the paper tracks the online behaviour of a small number of IP numbers, in an example of the application of microanalysis,