Monday, April 28, 2008

Search trails and relevance

Misha Bilenko and Ryen White from Microsoft Research had a paper at WWW 2008, "Mining the Search Trails of Surfing Crowds: Identifying Relevant Websites From User Activity" (PDF), that is a fascinating look at going beyond the first click on search results for improving search results and toward considering all the pages people visit.

An excerpt from the paper:
While query and clickthrough logs from search engines have been shown to be a valuable source of implicit supervision for training retrieval methods, the vast majority of users' browsing behavior takes place beyond search engine interactions.

This paper proposes exploiting a combination of searching and browsing activity of many users to identify relevant resources for future queries. To the best of our knowledge, previous approaches have not considered mining the history of user activity beyond search results, and our experimental results show that comprehensive logs of post-search behavior are an informative source of implicit feedback for inferring resource relevance.

Web browser toolbars have become increasingly popular ... Examples of popular toolbars include those affiliated with search engines (e.g., Google Toolbar, Yahoo! Toolbar, and Windows Live Toolbar) ... Most popular toolbars log the history of users' browsing behavior on a central server for users who consented to such logging. Each log entry includes an anonymous session identifier, a timestamp, and the URL of the visited Web page. From these and similar interaction logs, user trails can be reconstructed.

Training retrieval algorithms on interaction behavior from navigation trails following search engine result click-through leads to improved retrieval accuracy over training on only result click-through or search destinations ... Our research has profound implications for the design of Web search ranking algorithms and the improvement of the search experience for all search engine users.
Please see also Googler Daniel Russel's JCDL 2007 talk, "What are they thinking? Searching for the mind of the searcher" (PDF), which shows, starting on slide 33, that Google is using their toolbar data to analyze user behavior.

Thanks, Ionut Alex Chitu, for the pointer to Daniel's JCDL talk.

1 comment:

Shirish said...

This post reminds me of an observation that I made in my own search pattern.

A significant percentage of my queries have a "context" in the current web page or document that I am reading online. When I am reading a news article, I search for something related to that news item. When I am watching a music video on youtube, I search for the artist or that particular genre etc. I wonder if any search engine leverage such "context".

For any person, my strong feeling is that such "context"-driven queries cover a significant portion .. may be more than 40-50%!!