Implicit User-Generated Content in the Service of Public Health
Every day millions of people use online products and services to satisfy their information needs. While doing so, they produce large volumes of user-generated content (UGC). In this talk, we will distinguish between "explicit" UGC, which is intended to be made public (such as product ratings or reviews), and "implicit" UGC, which can be responsibly anonymized and aggregated in a privacy-preserving way to improve public health. We will analyze implicit UGC as a positive consumption externality, and will discuss its beneficial uses across a range of public health applications.
The bulk of this talk will focus on methods for aggregating and analyzing the data to provide timely signals that help guide public health interventions and assess their efficacy. We will discuss applications such as estimating disease incidence, outbreak prediction, mitigating pandemic spread, and improving public health messaging.
Dr. Evgeniy Gabrilovich is a research director at Facebook Reality Labs where he conducts research on neuromotor interfaces. Prior to that he was a principal research scientist / research director at Google Health where he founded and led the Public & Environmental Health Research team. He also led research teams working on health search, internationalization, and machine-learned epidemiology in Google Research. Before that he led the Knowledge Vault project, which focused on automatically mining the Web to discover new facts for the Knowledge Graph, and on estimating the trustworthiness of Web sources. Evgeniy is an ACM Fellow (2021) and IEEE Fellow (2021). He is a recipient of the 2010 Karen Spärck Jones Award for his contributions to natural language processing and information retrieval. He is also a recipient of the 2014 IJCAI-JAIR Best Paper Prize. Evgeniy has served as a program chair for WSDM 2021, WWW 2017, and WSDM 2015. He earned his PhD degree in computer science from the Technion - Israel Institute of Technology.
Ensemble Learning Methods for Dirty Data
Neural network ensemble is a collaborative learning paradigm that utilizes multiple neural networks to solve a complex learning problem. Constructing predictive models with high generalization performance is an important and yet most challenging goal for robust intelligence systems in the presence of dirty data. Given a target learning task, popular approaches have been dedicated to find the top performing model. However, it is difficult in general to estimate the best model when available data is finite, possibly dirty, and insufficient for the problem. In this keynote, I will give an overview of a diversity-centric ensemble learning framework developed at Georgia Tech, including methodologies and algorithms for measuring, enforcing, and combining multiple neural networks by improving generalization performance of the overall system and maximizing ensemble utility and resilience to dirty data.
Ling Liu is a Professor in the School of Computer Science at Georgia Institute of Technology. She directs the research programs in the Distributed Data Intensive Systems Lab (DiSL), examining various aspects of big data powered artificial intelligence (AI) systems, and machine learning (ML) algorithms and analytics, including performance, availability, privacy, security and trust. Prof. Liu is an elected IEEE Fellow, a recipient of IEEE Computer Society Technical Achievement Award (2012), and a recipient of the best paper award from numerous top venues, including IEEE ICDCS, WWW, ACM/IEEE CCGrid, IEEE Cloud, IEEE ICWS. Prof. Liu served on editorial board of over a dozen international journals and served as the editor in chief of IEEE Transactions on Service Computing (2013-2016), and currently is the editor in chief of ACM Transactions on Internet Computing (since 2019). Prof. Liu is a frequent keynote speaker in top-tier venues in Big Data, AI and ML systems and applications, Cloud Computing, Services Computing, Privacy, Security and Trust. Her current research is primarily supported by USA National Science Foundation under CISE programs, IBM and CISCO.
Customer Obsessed Science
Vanessa Murdock leads a research group in Alexa Shopping at Amazon, whose focus is recommender systems, search and HCI. Her team provides the machine learning that backs Amazon’s Choice, and the Alexa Shopping list. Previously, she worked at Microsoft as a Principal Scientist, working on location inference and notifications at Bing and Cortana. Prior to Microsoft, Murdock led the Geographic Context and Experience Group at Yahoo! Research in Barcelona, doing research on topics related to geographic information retrieval and user-generated content. She has been awarded 19 patents, and has more than 20 patent applications pending, resulting in a Master Inventor Award from Yahoo! (2012). She received the OAA Award for Outstanding Achievement by a Young Alum from the University of Massachusetts in 2014. Murdock received a Ph.D. in Computer Science from the University of Massachusetts Amherst in 2006, advised by Bruce Croft.
Exploring and Analyzing Change: The Janus Project
Data change, all the time. The Janus project seeks to address the Variability dimension of Big Data by modeling, exploring, and analyzing such change, providing valuable insights into the evolving real world and the ways in which data about it are collected and used.
We start by identifying technical challenges that need to be addressed to realize the Janus vision. Towards this end, we have extracted and worked with the histories of various structured datasets, including DBLP, IMDB, open government data, and Wikipedia, for which a detailed history of every edit is available. Our DBChEx (Database Change Explorer) prototype enables interactive exploration of data and schema changes, and we show how DBChEx can help users gain valuable insights by exploring two real-world datasets, IMDB and Wikipedia infoboxes.
Based on an analysis of the history of 3.5M tables on the English Wikipedia for a total of 53.8M table versions, we then illustrate the rich history of structured Wikipedia data: we show that tables are created in certain locations, they change their shape, they move, they grow, they shrink, their data change, they vanish, and they re-appear; indeed, each table has a life of its own. Finally, to help automatically interpret the useful knowledge harbored in the history of Wikipedia tables, we present recent results on two technical problems: (i) identifying Natural Keys, a particularly important piece of metadata, which serves as a primary key in tables over time and consists of attributes inherent to an entity, and (ii) matching tables, infoboxes and lists within a Wikipedia page across page revisions. We solve these problems at scale and make the resulting curated datasets available to the community to facilitate future research.
This is joint work with Tobias Bleifuß, Leon Bornemann, Dmitri Kalashnikov, and Felix Naumann.
Divesh Srivastava is the Head of Database Research at AT&T. He is a Fellow of the ACM, the President of the VLDB Endowment, co-chair of the ACM Publications Board, and on the Board of Directors of the Computing Research Association. He has served as PC co-chair of many international conferences including SIGMOD 2021, VLDB 2020 (Industrial), SIGMOD 2020 (Industrial), and ICDE 2019. He has presented keynote talks at several international conferences, and his research interests and publications span a variety of topics in data management. He received his Ph.D. from the University of Wisconsin, Madison, USA, and his Bachelor of Technology from the Indian Institute of Technology, Bombay, India.
How Hybrid Work Will Make Work More Intelligent
We are in the middle of the most significant change to work practices in generations. For hundreds of years, physical space was the most important technology people used to get things done. The coming Hybrid Work Era, however, will be shaped by digital technology. The recent rapid shift to remote work accelerated the digital transformation already underway at many organizations, and new types of work-related data are now being generated at an unprecedented rate. For example, the average Microsoft Teams user spends 252% more time in the application now than they did in February 2020.
During the early stages of the pandemic, we saw the direct impact of digital technology on work in its ability to help people sustain collaboration across time and space. But looking forward, the new digital knowledge captured in the Hybrid Work Era will allow us to reimagine work at an even more fundamental level. AI systems, for example, can now learn from the conversations people have to support knowledge re-use, and even learn how successful conversations happen to help drive more productive meetings.
Historically, AI systems have been hindered in a work context by a lack of data; the development of foundation models is changing that, creating an opportunity to combine general world knowledge with the knowledge and behaviors currently locked up and siloed as we work. The CIKM community can shape the new future of work, but first must address the challenges surrounding workplace knowledge management that arise as we have more data, more sophisticated AI, and more human engagement. In this talk I will give an overview of what research tells us about emerging work practices, and explore how the CIKM community can build on these findings to help create a new – and better – future of work.
Jaime Teevan is Chief Scientist and Technical Fellow at Microsoft, where she is responsible for driving research-backed innovation in the company’s core products. She leads Microsoft’s future of work initiative, which brings researchers from Microsoft, LinkedIn, and GitHub together to study how the pandemic has changed the way people work. Previously she was Technical Advisor to CEO Satya Nadella and led the Productivity team at Microsoft Research. Jaime was recently inducted into the SIGIR and SIGCHI Academies, and has received numerous awards for her research, including the Technology Review TR35, Borg Early Career, Karen Spärck Jones, and SIGIR Test of Time awards. She holds a Ph.D. in AI from MIT and a B.S. from Yale, and is an Affiliate Professor at the University of Washington.