[Homepage]|[Publications]|[Skills]|[Personality]|[Hobbies]|[Contact]
Jean-Pierre Norguet's Review
Jean-Pierre Norguet's review of Barbara Poblete's and Ricardo Baeza-Yates' WWW-2006 paper "A content and structure Website mining model"
Paper reviewed: Poblete B., Baeza-Yates R., A content and structure Website mining model,
Proceedings of the 15th International Conference on the World Wide Web,
Edinburgh, Scotland, May 23-26, 2006) 957-958, 2006. Review date: 4 Oct
2006. Review published with ACM Computing Reviews [http://www.reviews.com].
Review
With the emergence of the World Wide Web, Web sites have become a key communication channel for organizations. In this context, analyzing and improving Web communication is essential to better satisfy the objectives of the target audience. In this paper, an original Web-usage-mining approach is proposed for improving Web site content and structure. In this approach, the analyzed Web site pages are grouped into clusters based on the similarity between the pages. Then, for each cluster of pages, the number of hits by page is aggregated into the number of hits by cluster. What makes the approach interesting is its assumption that these clusters represent topics of the Web site, and subsequently that the cluster hits represent the popularity of the topics. A prototype implementing this approach has been tested on real Web sites, and the results have been found helpful, especially for large sites.
The paper is clearly written and scientifically accurate. The main contribution is the use of a clustering algorithm to automatically group the Web site pages according to their similarity. Also, the exploitation of the page-group hits to improve the Web site structure is relevant and extensively described. Finally, separating internal and external referrals provides additional information that can be used to improve the Web site. This makes the approach a significant research contribution, especially for a two-page proceedings poster.
Nevertheless, the paper overlooks a number of issues. These issues can, however, be addressed by a number of techniques. For instance, in which cluster should a page that treats several topics be categorized? Page polysemy could be handled using probabilistic clustering. In which cluster should a dynamically changing page--like a news page--be categorized? Page temporality could be handled by content journaling. How are the generated pages clustered? Page volatility could be handled by output page mining.
The results shown are sparse and do not seem very intuitive. In addition, no formal or objective evaluation is provided. It is therefore difficult to determine what the results are and what their added value is. The authors make the assumption that the produced clusters cover particular Web site topics. However, it is unclear whether these topics are meaningful to the human user. Sample topics should have been shown or a formal evaluation should have been conducted on this issue. Ontologies or taxonomies representing the Web site knowledge domain could have been used to improve the intuitiveness of the results.
The target audience of this paper includes Web site administrators and researchers. Web site administrators could be interested in implementing the proposed approach in order to improve their Web site structure and content with regard to Web site usage. However, it should be stressed that only static Web pages are supported; Web sites that contain composite, evolutive, or scripted Web pages require the approach to be extended with other techniques. Researchers in Web usage mining, Web structure mining, and Web content mining will also be interested in the research contributions of this paper. Finally, extending these contributions with the techniques cited above would be interesting.
Back to Jean-Pierre Norguet's homepage.