Following up on the highly successful SIGIR 2007 Industry Event and a similar event at CIKM 2008, this year’s SIGIR conference includes an Industry Track, to be held on Wednesday, July 22, 2009 during the regular conference program (in parallel with the technical tracks). This new track aims to bridge the gap between research and practice across a broad spectrum of topics in information retrieval. The day’s sessions will bring together some of the brightest stars from industry who are addressing practical information retrieval problems. The Industry Track aims to address two constituencies: it gives researchers an opportunity to learn about the problems most relevant to industry practitioners, and offers practitioners an opportunity to deepen their understanding of the field in which they are working.
The agenda consists of the following speakers and panelists invited by the Industry Track program committee and the SIGIR 2009 Organizing Committee.
These presentations will occur in parallel with the technical program of the day and attendees may move between the technical tracks and the industry track at will. That is, anyone registered for the full conference will be able to attend Industry Track presentations. Also, anyone opting for the special Wednesday-only registration will be invited to attend any technical presentations on that day. (Registration details are not yet available.)
Matt Cutts, Google (8:30-9:15)
- "Web Spam and Adversarial IR: The Road Ahead"
- In this talk, I'll discuss both the mindset and techniques of spammers. We'll cover several real-world examples of websites under attack by spammers and what the future holds for black-hats and white-hats alike.
- Matt Cutts works for the Search Quality group in Google, specializing in search engine optimization issues. He is well known in the SEO community for enforcing the Google Webmaster Guidelines and cracking down on link spam. He also advises the public on how to get better website visibility in Google. Before working at the Search Quality group at Google, Cutts worked at the ads engineering group and on Google's SafeSearch. He is one of the co-inventors listed upon a Google patent related to search engines and web spam, which was the first to publicly propose using historical data to identify link spam.
danah boyd, Microsoft Research (9:15-10:00)
- "The Searchable Nature of Acts in Networked Publics"
- Mediated public spaces that connect people through networked technologies - "networked publics" - have properties that are quite different from their physical equivalent. The content produced in networked publics is typically persistent, replicable, searchable, and accessible to a much different scale of audience. Each of these properties alters the dynamics of everyday interactions that take place in these environments. In this talk, I will explore what it means that sociality is fundamentally searchable and how people navigate these spaces with this property in mind.
- danah boyd is a Researcher at Microsoft Research New England and a Fellow at Harvard's Berkman Center for Internet and Society. She recently completed her PhD at the School of Information (iSchool) at the University of California-Berkeley. Her research examines social media, youth practices, tensions between public and private, social network sites, and other intersections between technology and society.
Vanja Josifovski, Yahoo! Research (10:30)
- "Ad Retrieval – A New Frontier of Information Retrieval"
- The field of Information Retrieval has produced some of the most impactful Computer Science research: from the early beginning of exploring how to search articles and books in libraries, to the current focus on searching the World Wide Web. In this talk, I will make the case for Ad Retrieval as a new distinct sub-discipline of Information Retrieval focusing on retrieving online advertisements. Online advertising affects virtually every web user and has grown into a $20 billon industry. As with the Web corpus, the structure of the online ads is substantially different than any other previously studied corpus. The queries used in selecting online ads can also substantially differ from the commonly explored short textual queries, as for example when selecting advertisements for a given web page or a context of a user. These differences require reexamination of many conclusions of traditional IR, as document analysis, query expansion, scoring and length normalization, and performance evaluation. The talk will outline some of the main challenges of Ad Retrieval and discuss how to engage the SIGIR community in the exploration of this new frontier of Information Retrieval.
- Vanja Josifovski is Principal Research Scientist and the Lead of the Textual Advertising Group at Yahoo! Research. He joined Yahoo! Research in late 2005 and has since spent most of his time designing and building Yahoo!'s next generation advertising platforms. Prior to joining Yahoo!, Vanja was a Research Staff Member at the IBM Almaden Research Center working in the areas of database and enterprise search systems. Vanja has published over fifty peer reviewed publications, and has been on the program committees of WWW, SIGIR, ICDE, VLDB, SIGKDD and other major conferences in the database, information retrieval and search areas.
Evan Sandhaus, New York Times (11:00)
- "Corpus Linguistics and Semantic Technology at the New York Times"
- Abstract TBA
- Evan Sandhaus works as the Semantic Technologist in the New York Times Research and Development Labs. In this role, Evan has released 1.8 million documents to the computer science research community, helped to put the New York Times on Google Earth and collaborated with New York University to explore new directions in News Search. Evan holds degrees in Computer Science from both Williams College and Villanova University and currently resides in Brooklyn, New York.
Tip House , OCLC (11:30)
CANCELED BECAUSE OF ILLNESS -- REPLACED BY TALK ON DEVELOPMENTS IN BING BY NICK CRASWELL
- "Alexandria 2.0: Search Innovations Keep Libraries Relevant in an Online World"
- The costs of searching very large collections of meta-data, abstracts, and full-text content have been reduced dramatically for OCLC and the library community through a series of searching innovations that began in 2006 and are continuing through today. This huge reduction has allowed the community to make more than a billion items in over 70,000 libraries around the world freely discoverable on the web at http://www.worldcat.org, and establish a collective web presence that extends the relevance of the library beyond the traditional bricks-and-mortar model.
- Some of the innovations making this possible include the implementation of grid and autonomic computing, the complete elimination of disc access, the use of deterministic query optimization, and content-based relevance ranking algorithms. Current work on automated indexing and categorization, semantic and topological searching, and inclusion of relational and XML/RDF capabilities along with full text capabilities into a single integrated engine will continue to enable a relevant presence in the online world for the library community well into the future.
- Tip House is OCLC's chief architect and the creator of the Find search engine, which underlies most of OCLC's products and services, including Worldcat.org, WorldCat Resource Sharing, OCLC FirstSearch, OCLC Connexion, and the WorldCat Registry. He is a thirty-year veteran of software development and author of numerous papers and presentations on software testing, software measurement, electronic document control, and author (with Lisa Crispin) of the book Testing Extreme Programming. Picture (optional, and will most likely be scaled down to 50x50)
Panel: Search Industry Analysts (2:00)
What concerns drive the business of enterprise search, and how should technologies approach them?
- Whit Andrews, VP, Distinguished Analyst, Gartner
- Andrews has covered information access technologies, including enterprise search, since 1994. He originated the notion of the Hostile Information Ecosystem, in which enterprises and users face risks from others users, other enterprises, and malicious or negligent use of search technologies themselves. He is also a significant contributor to e-discovery research, including particularly the collection and review phases of e-discovery.
- Susan Feldman, Vice President, Search and Discovery Technologies, IDC
- Sue Feldman directs the Content Technologies Group and specializes in research on search and discovery technologies. Her research analyzes the trends and dynamics of the search software market and documents the spread of these technologies within software applications. Her "Hidden Costs of Information Work" quantifies the costs of information work to the organization, and discusses the need for streamlining and automating information tasks that are not productive. She initiated IDC's research on the dynamics of the digital marketplace, building a model that predicts ad revenue from clicks in search engines and forecasts scenarios for how that revenue is apportioned among major search engines. She is a frequent speaker at industry events, and has won several national and international awards for her writing.
- Theresa Regli, Analyst, CMS Watch
- Theresa Regli is Principal at CMS Watch, covering enterprise search, semantic technologies, digital asset management and related technologies and practices. Previous to her work as an analyst and educator, Regli spent over ten years leading content management implementations for clients in North America and Europe. She developed taxonomies, content management strategies, CMS-driven web sites and information architecture solutions for several Fortune 100 organizations. She holds degrees and certifications in Romance Languages and Linguistics from universities in France, the U.S. and the U.K.
- Responder: Marti Hearst, University of California, Berkeley
- Marti Hearst is a professor in the School of Information at UC Berkeley. Her primary research interests are user interfaces for search engines, information visualization, natural language processing, and empirical analysis of social media. She has just completed the first book on Search User Interfaces. Prof. Hearst received BA, MS, and PhD degrees in Computer Science from the University of California at Berkeley, and she was a Member of the Research Staff at Xerox PARC from 1994 to 1997. Prof. Hearst has served on the Advisory Council of NSF's CISE Directorate and is co-chair of the Web Board for CACM. She is a member of the Usage Panel for the American Heritage Dictionary and is on the Edge.org panel of experts. Prof. Hearst is on the editorial boards of ACM Transactions on the Web and ACM Transactions on Computer-Human Interaction and was formerly on the boards of Computational Linguistics, ACM Transactions on Information Systems, and IEEE Intelligent Systems.
- Moderator: Daniel Tunkelang, Endeca
- Daniel Tunkelang is the Chief Scientist and a co-founder of Endeca, a leading provider of search applications. Daniel is a passionate advocate of dialog-oriented, exploratory approaches to search, and he blogs about these subjects at The Noisy Channel. Daniel has spearheaded the annual workshops on Human Computer Information Retrieval (HCIR), and recently published a book on faceted search. He studied math and computer science at MIT, and has a PhD in computer science from CMU. Before joining Endeca’s founding team, he worked at the IBM T. J. Watson Research Center and AT&T Bell Labs.
Panel: Enterprise Search Vendors (4:00)
What are the technical challenges driving enterprise search, and what are/should researchers and practitioners be doing to address them?
- Jeff Fried, FAST, a Microsoft Subsidiary
- Jeff Fried is a senior product manager at Microsoft, specializing in strategic applications of search technology. He is a frequent speaker and writer in the industry, holds 15 patents, and has led the creation of pioneering offerings in next generation search engines, networks, and contact centers. Jeff was CTO of software test and management firm Empirix, led development at speech application firm Unveil Technologies (acquired by Microsoft), was co-founder and CTO of distributed call center firm Teloquent Communications (acquired by Syntellect), and was SVP of products at natural language software firm LingoMotors. Jeff has authored more than 50 technical papers, is the recipient of numerous industry awards, and holds BS, MS, and Engineer’s degrees in computer science from MIT.
- Raul Valdes-Perez, Executive Chairman, Vivisimo
- Raul led Vivisimo since co-founding it in June 2000 until June 2009, when he became executive chairman. Before Vivisimo, Raul was on the Carnegie Mellon University computer science department faculty, where his research on new methods and applications of knowledge discovery led to publishing nearly 50 journal articles in natural, social and computer science. He was a principal investigator on six grants from the National Science Foundation, served on its advisory committee for Social, Behavioral, and Economic Sciences, and was an action editor of the journal Machine Learning. Raul received a Ph.D. in computer science at Carnegie Mellon in 1991, where his advisor was Herbert A. Simon, and B.S. and M.S. degrees in information engineering from the University of Illinois at Chicago.
- Adam Ferrari, Chief Technology Officer, Endeca
- Adam was the original developer and chief architect of Endeca's core MDEX Engine™ technology, and as CTO he continues to be an active architect of and contributor to Endeca's Information Access Platform. His research at Endeca has resulted in nine patent-pending inventions covering advanced information retrieval, index structures, and data analysis. Prior to Endeca, Adam was a principal architect and developer of the Legion grid computing platform, which was subsequently acquired by Sybase. His academic research background spans areas in distributed computing, including heterogeneous systems, multithreading, and programming environments for HPC applications. Adam holds a B.A. in Physics from Cornell University, an M.S. in Mathematics and Computer Science from Emory University, and a Ph.D. in Computer Science from the University of Virginia.
- Responder: W. Bruce Croft, University of Massachusetts
- W. Bruce Croft is a Distinguished Professor in the Department of Computer Science at the University of Massachusetts, Amherst, which he joined in 1979. In 1992, he founded the Center for Intelligent Information Retrieval (CIIR), which combines basic research with technology transfer to a variety of government and industry partners. His research interests are in many areas of information retrieval, including retrieval models, representation, Web search, query processing, cross-lingual retrieval, and search architectures. Dr. Croft was a member of the National Research Council Computer Science and Telecommunications Board, 2000-2003, and Editor-in-Chief of ACM Transactions on Information Systems, 1995-2002. Dr. Croft was elected a Fellow of ACM in 1997, received the Research Award from the American Society for Information Science and Technology in 2000, and received the Gerard Salton Award from the ACM Special Interest Group in Information Retrieval (SIGIR) in 2003.
- Moderator: Elizabeth Liddy, Syracuse University
- Elizabeth D. Liddy is dean and Trustee Professor in the School of Information Studies at Syracuse University. In 1999, she founded and led the school’s Center for Natural Language Processing, which advances the development of human-like language understanding software capabilities for government, commercial, and consumer applications. Liddy was Founding President and CEO of TextWise, LLC, from 1994-1999, an early developer of NLP-based search technologies, used by the US and European Patent Offices. She has led 65 research projects, has authored more than 110 research papers, and has given hundreds of conference presentations on her work. She is a recipient of the Tibbetts Award from the SBIR Program of the U.S. Small Business Administration (1998), the Enterprise Award for Technology from the Upstate New York Technology Business Forum (1998), the Outstanding Alumni Award from SU (2000), the Post-Standard and Syracuse-Federation of Women's Clubs Achievement Award (2005), and the 12th Annual Search Engine Conference Best Paper Award (2007). In addition, she was elected chair of the Association for Computing Machinery Special Interest Group on Information Retrieval for 2007–2009. Liddy is a member of Beta Phi Mu, the library and information studies honor society, and Sigma Xi, the international honor society of scientific and engineering research.
Previously scheduled panelists
These individuals were previously scheduled to appear on a panel or give a talk but were unable to make it. We greatly appreciate their willingness to participate and are sorry that circumstances pevented their appearing. We are particularly grateful for those who were able to replace them on short notice.
- Øystein Torbjørnsen, FAST, a Microsoft Subsidiary (replaced by Jeff Fried)
- Dr Torbjørnsen has an 18 years background in the software industry doing research and development for both startups and multinational corporations. He was a cofounder of Clustra Systems where he was the lead architect of a highly available and parallel DBMS targeted for the telecom market. The company was acquired by Sun Microsystems where he got the role as Distinguished Engineer in the database group. He has been working for FAST since 2004 bringing database capabilities into search engines. He has a PhD and an MSc in Software Engineering from Norwegian University of Science and Technology and still has a close relationship with the university where he currently advices several PhD and master students.
- Peter Menell, Chief Technology Officer, Autonomy
- Dr. Peter Menell, D.Phil. Oxon, joined Autonomy's Engineering and Technology Solutions unit in 1998 and has served as Chief Technology Officer since 2004. During that time he has overseen a number of significant advances in Autonomy technology, including key new patents filed, and been responsible for the successful deployment of Autonomy software at multi-divisional organizations across the globe. Prior to joining Autonomy, Dr Menell conducted computational and neuro-physiology research in visual and auditory impairment. Dr Menell holds a B.A. (Hons) and M.Sc. from York University and a D.Phil. from Magdalen College Oxford.
- Thomas (Tom) Tague, Thomson Reuters
- Tom Tague leads Thomson Reuters Calais initiative, spearheading strategy and product development. He also oversees the Calais developer community at OpenCalais.com, evangelizing the Calais Web service and its free and open API while working closely with Calais partners and commercial and non-commercial developers alike. Previous positions include EVP, Client Solutions for Darwin Partners and co-founder and COO for Tessera Enterprise Systems, as well as senior roles at Epsilon and Electronic Data Systems (EDS).