Tutorial - T6: IR Prototypes and Web Search Hacks with Open Source Tools

Rosie Jones (Yahoo!) [Short Bio]
Vik Singh (Yahoo!)


Web search is a public-facing industry application of IR research. One of the best ways to gather data about web search behavior is to build your own search system. Prototype IR and web search systems can be used to gather user interaction data and test the applicability of research ideas. Open source tools and services by can greatly speed up the implementation of these systems, allowing for quick evaluation. We will give detailed overviews of several open source tools, providing examples of search and IR algorithms and systems implemented using them, as well as discussing how evaluation can be carried out using these tools.


Rosie Jones

Rosie Jones

Rosie Jones is a Senior Research Scientist in Information Retrieval at Yahoo! Labs. She is an active participant in the IR community, serving as Senior PC member for SIGIR since 2007. Her research interests include information retrieval, web search and natural language processing. Dr. Jones co-taught the tutorial on web search at SIGIR 2008. In 2009 she co-organized the workshop on web search click data (WSDM) at WSDM. In 2005 she co-organized the SIGIR workshop on lexical cohesion and information retrieval, and in 2003 she co-organized the ICML workshop on the Continuum from Labeled to Unlabeled Data in Machine Learning and Data Mining. Dr. Jones obtained her PhD from the Language Technologies Institute at Carnegie Mellon University and is a Senior Member of the ACM.

Vik Singh

Vik Singh

Vik Singh, an applied researcher and architect at Yahoo!, created and designed the BOSS Open Search platform. Vik has developed and open sourced several supporting examples such as the BOSS Mashup Framework, BOSSy QnA search, and TweetNews (which Wired called ``the best mashup we've ever seen''). He previously worked at Google, shipping Custom Search/Co-op, at Microsoft Research, data mining the SkyServer web and SQL logs under Dr. Jim Gray, and at Intel Research on distributed overlay systems. He has filed 9 patents in the areas of search, web services, and content optimization. Vik graduated with a bachelor's in computer science from UC Berkeley.