Keynotes | KDD 2015, 10-13 August 2015, Sydney

Ronny Kohavi

Distinguished Engineer & General Manager

Analysis & Experimentation, Microsoft

Online Controlled Experiments: Lessons from Running A/B/n Tests for 12 Years

Slides

Hugh Durrant-Whyte

Professor, ARC Federation Fellow

Faculty of Engineering, The University of Sydney

Data, Knowledge and Discovery: Machine Learning meets Natural Science

Daphne Koller

President and Co-founder

Coursera

MOOCS: What Have We Learned?

Susan Athey

The Economics of Technology Professor

Stanford Graduate School of Business

Machine Learning and Causal Inference for Policy Evaluation

Online Controlled Experiments: Lessons from Running A/B/n Tests for 12 Years

Abstract

The Internet provides developers of connected software, including web sites, applications, and devices, an unprecedented opportunity to accelerate innovation by evaluating ideas quickly and accurately using trustworthy controlled experiments (e.g., A/B tests and their generalizations). From front-end user-interface changes to backend recommendation systems and relevance algorithms, from search engines (e.g., Google, Microsoft’s Bing, Yahoo) to retailers (e.g., Amazon, eBay, Netflix, Etsy) to social networking services (e.g., Facebook, LinkedIn, Twitter) to Travel services (e.g., Expedia, Airbnb, Booking.com) to many startups, online controlled experiments are now utilized to make data-driven decisions at a wide range of companies. While the theory of a controlled experiment is simple, and dates back to Sir Ronald A. Fisher’s experiments at the Rothamsted Agricultural Experimental Station in England in the 1920s, the deployment and mining of online controlled experiments at scale (e.g., hundreds of experiments run every day at Bing) and deployment of online controlled experiments across dozens of web sites and applications has taught us many lessons. We provide an introduction, share real examples, key lessons, and cultural challenges.

Bio

Ronny Kohavi is a Microsoft Distinguished Engineer and the General Manager for Microsoft’s Analysis and Experimentation team at Microsoft. He joined Microsoft in 2005 and founded the Experimentation Platform team in 2006. He was previously the director of data mining and personalization at Amazon.com, and the Vice President of Business Intelligence at Blue Martini Software, which went public in 2000, and later acquired by Red Prairie. Prior to joining Blue Martini, Kohavi managed MineSet project, Silicon Graphics’ award-winning product for data mining and visualization. He joined Silicon Graphics after getting a Ph.D. in Machine Learning from Stanford University, where he led the MLC++ project, the Machine Learning library in C++ used in MineSet and at Blue Martini Software. Kohavi received his BA from the Technion, Israel. He was the General Chair for KDD 2004, co-chair of KDD 99’s industrial track with Jim Gray, and co-chair of the KDD Cup 2000 with Carla Brodley. He was an invited speaker at the National Academy of Engineering in 2000, a keynote speaker at PAKDD 2001, an invited speaker at KDD 2001’s industrial track, a keynote speaker at EC 2010 and at Recsys 2012. His papers have over 26,000 citations and three of his papers are in the top 1,000 most-cited papers in Computer Science.

Data, Knowledge and Discovery: Machine Learning meets Natural Science

Abstract

Increasingly it is data, vast amounts of data, that drives scientific discovery. At the heart of this so-called “fourth paradigm of science” is the rapid development of large scale statistical data fusion and machine learning methods. While these developments in “big data” methods are largely driven by commercial applications such as internet search or customer modelling, the opportunity for applying these to scientific discovery is huge. This talk will describe a number of applied machine learning projects addressing real-world inference problems in physical, life and social science areas. In particular, I will describe a major Science and Industry Endowment Fund (SIEF) project, in collaboration with the NICTA and Macquarie University, looking to apply machine learning techniques to discovery in the natural sciences. This talk will look at the key methods in machine learning that are being applied to the discovery process, especially in areas like geology, ecology and biological discovery.

Bio

Hugh Durrant-Whyte is a Professor and ARC Federation Fellow at the University of Sydney. From 2010-2014, he was CEO of National ICT Australia (NICTA), and from 1995-2010 Director of the ARC Centre of Excellence for Autonomous Systems and of the Australian Centre for Field Robotics (ACFR). He has published over 350 research papers and founded four successful start-up companies. He has won numerous awards and prizes for his work, including being named the 2008 Professional Engineer of the Year by the Institute of Engineers Australia Sydney Division and the 2010 NSW Scientist of the Year. He is a Fellow of the of the Australian Academy of Science (FAA), and a Fellow of the Royal Society of London (FRS).

MOOCS: What Have We Learned?

Abstract

It has been nearly four years since the first MOOCs (massive open online courses) were offered by Stanford University. MOOCs are now offered to tens of millions of learners worldwide, by hundreds of top universities. MOOCs are no longer an experiment - the learning, reach, and value they offer are now a reality. I will show how MOOCs provide opportunities for open-ended projects, intercultural learner interactions, and collaborative learning. I will discuss some of data that we are collecting from MOOCs, and what we are learning from these data about both courses and learners. Finally, I will discuss both data and examples of the kind of transformative impact that can be derived from providing millions of people with access to the world's best education.

Bio

Daphne Koller is the President and Co-founder of Coursera, a social entrepreneurship company that works with the best universities to connect anyone around the world with the best education, for free. Coursera is the leading platform of its kind, and has partnered with over 100 of the world’s top universities offering hundreds of courses in a broad range of disciplines to millions of students, spanning every country in the world. Koller was recognized as one of Time Magazine’s 100 Most Influential People for 2012, Newsweek’s top 10 most important people in 2010, Huffington Post 100 Game Changes for 2010, and more. Prior to founding Coursera, Koller was the Rajeev Motwani Professor of Computer Science at Stanford University, where she served on the faculty for 18 years. In her research life, she worked in the area of machine learning and probabilistic modeling, with applications to systems biology and personalized medicine.

She is the recipient of many awards, which include the Presidential Early Career Award for Scientists and Engineers, the MacArthur Foundation Fellowship, the ACM/Infosys award, and membership in the US National Academy of Engineering and the American Academy of Arts and Sciences. She is also an award winning teacher, who pioneered in her Stanford class many of the ideas that underlie the Coursera user experience. She received her BSc and MSc from the Hebrew University of Jerusalem, and her PhD from Stanford in 1994.

Machine Learning and Causal Inference for Policy Evaluation

Abstract

A large literature on causal inference in statistics, econometrics, biostatistics, and epidemiology (see, e.g., Imbens and Rubin [2015] for a recent survey) has focused on methods for statistical estimation and inference in a setting where the researcher wishes to answer a question about the (counterfactual) impact of a change in a policy, or “treatment” in the terminology of the literature. The policy change has not necessarily been observed before, or may have been observed only for a subset of the population; examples include a change in minimum wage law or a change in a firm’s price. The goal is then to estimate the impact of small set of “treatments” using data from randomized experiments or, more commonly, “observational” studies (that is, nonexperimental data). The literature identifies a variety of assumptions that, when satisfied, allow the researcher to draw the same types of conclusions that would be available from a randomized experiment. To estimate causal effects given nonrandom assignment of individuals to alternative policies in observational studies, popular techniques include propensity score weighting, matching, and regression analysis; all of these methods adjust for differences in observed attributes of individuals. Another strand of literature in econometrics, referred to as “structural modeling,” fully specifies the preferences of actors as well as a behavioral model, and estimates those parameters from data (for applications to auction-based electronic commerce, see Athey and Haile [2007] and Athey and Nekipelov [2012]). In both cases, parameter estimates are interpreted as “causal,” and they are used to make predictions about the effect of policy changes.

In contrast, the supervised machine learning literature has traditionally focused on prediction, providing data-driven approaches to building rich models and relying on crossvalidation as a powerful tool for model selection. These methods have been highly successful in practice. This talk will review several recent papers that attempt to bring the tools of supervised machine learning to bear on the problem of policy evaluation, where the papers are connected by three themes.

The first theme is that it important for both estimation and inference to distinguish between parts of the model that relate to the causal question of interest, and “attributes,” that is, features or variables that describe attributes of individual units that are held fixed when policies change. Specifically, we propose to divide the features of a model into causal features, whose values may be manipulated in a counterfactual policy environment, and attributes. A second theme is that relative to conventional tools from the policy evaluation literature, tools from supervised machine learning can be particularly effective at modeling the association of outcomes with attributes, as well as in modeling how causal effects vary with attributes. A final theme is that modifications of existing methods may be required to deal with the “fundamental problem of causal inference,” namely, that no unit is observed in multiple counterfactual worlds at the same time: we do not see a patient at the same time with and without medication, and we do not see a consumer at the same moment exposed to two different prices. This creates a substantial challenge for cross-validation, as the ground truth for the causal effect is not observed for any individual.

The talk reviews several lines of research that incorporate these themes. The first, exemplified by Athey and Imbens [2015a], focuses on estimating heterogeneity in treatment effects, identifying (based on unit attributes) subpopulations of units that have larger or smaller than average treatment effects. The method enables valid inference: confidence intervals for the size of the treatment effect in each subpopulation are derived. Thus, largescale randomized experiments for drugs or A/B tests in online settings can be evaluated systematically, with the method discovering the magnitude of treatment effect heterogeneity. The challenge in this setting is to find a method that is optimized for the problem of predicting causal effects, rather than for predicting outcomes. The approach can also be applied to observational studies under some additional conditions. Our approach addresses the problem of cross-validation by constructing an unbiased (but noisy) estimate of each unit’s treatment effect. More generally, we pose the question of how to best modify supervised machine learning methods to use estimated parameters rather than observed data in cross-validation. In ongoing research, we explore this question in a variety of settings.

A second line of research analyzes robustness of causal estimates. In applied social science studies of the impact of policy changes, it is common for researchers to present a handful of alternative models to assess the robustness of the causal estimates. Although the importance of model robustness has been highlighted by many researchers (e.g. Leamer [1983]), to date no metric for the robustness of a model has gained widespread adoption in the policy evaluation literature. Athey and Imbens [2015b] propose a measure of robustness of parameter estimates. A starting point is to define the causal estimand of interest as well as the attributes of individuals in the dataset (features that may affect the robustness of the causal estimate). The method for constructing the robustness measure is inspired by the machine learning technique of regression trees. The sample is split according to each attribute in turn, and the original model re-estimated on the two subsamples. The split point is determined as the one that leads to the greatest improvement in model fit. An alternative estimate of the causal effect is constructed by taking a weighted average of the estimates in the two subsamples. The robustness measure is then defined as the standard deviation of the estimates, taken over all of the alternative estimates (one for each attribute). This measure has some attractive properties: there is no need to define an estimation approach other than the one used in the baseline model, and the measure is robust to monotone transformations of the individual attributes. The measure lacks other desirable properties, however: it can be reduced by adding irrelevant attributes to the model, for example. An ongoing research agenda addresses this and other issues.

Finally, Abadie et al. [2014] consider the problem of inference in environments where the researcher may observe a large part of a population, or an entire population. It is typical in social science to treat causal features and attributes symmetrically when conducting inference about parameter estimates, and to justify inference by appealing to the idea that the data are a random sample from a larger population. We argue that this convention is not appropriate, and that the source of uncertainty for causal estimands is not purely sampling variation, but rather uncertainty arises because we do not observe all of the potential outcomes for any unit. The distinction is especially clear if we observe the entire population of interest: we may observe average income for all fifty states or all countries in the world, or we may observe all advertisers or sellers or consumers on an electronic commerce platform. When the population is observed, there is no uncertainty about the answers to questions such as, what is the average difference in income or average online purchases between coastal and interior states. On the other hand, if we attempt to estimate the effect of changing minimum wage policy or prices, we have residual uncertainty about the effect of making such a change even if we observe a randomized experiment comparing the two policies, as we do not observe any given unit under multiple policies at the same time. We propose an alternative approach to conducting inference in regression models that takes these factors into account, showing that in general conventional standard errors are conservative. More broadly, this paper highlights the theme that the theory of inference is different for causal estimates than it is for parameter estimates associated with fixed attributes of individuals.

Bio

Susan Athey is The Economics of Technology Professor at Stanford Graduate School of Business. She received her bachelor's degree from Duke University and her Ph.D. from Stanford, and she holds an honorary doctorate from Duke University. She previously taught at the economics departments at MIT, Stanford and Harvard. In 2007, Professor Athey received the John Bates Clark Medal, awarded by the American Economic Association to “that American economist under the age of forty who is adjudged to have made the most significant contribution to economic thought and knowledge.” She was elected to the National Academy of Science in 2012 and to the American Academy of Arts and Sciences in 2008. Professor Athey’s research focuses on the economics of the internet, online advertising, the news media, marketplace design, virtual currencies and the intersection of computer science, machine learning and economics. She advises governments and businesses on marketplace design and platform economics, notably serving since 2007 as a long-term consultant to Microsoft Corporation in a variety of roles, including consulting chief economist.

References

Guido Imbens and Donald Rubin. 2015. Causal Inference for Statistics, Social and Biomedical Sciences: An Introduction. Cambridge University Press: Cambridge, United Kingdom.
Susan Athey and Philip Haile. 2007. Nonparametric approaches to auctions. In James J. Heckman and Edward E. Leamer, eds. Handbook of Econometrics Volume 6, Elsevier, 3847-3965.
Susan Athey and Denis Nekipelov. 2012. A Structural Model of Sponsored Search Advertising Auctions. Working paper, Stanford University. Retrieved May 30, 2015 from http://facultygsb.stanford.edu/athey/documents/Structural_Sponsored_Search.pdf.
Susan Athey and Guido Imbens. 2015a. Machine learning methods for estimating heterogeneous causal effects. ArXiv e-print number 1504.01132. Retrieved May 30, 2015 from http://arxiv.org/abs/1504.01132.
Edward Leamer. 1983. Let’s take the con out of econometrics. American Economic Review 73, 1 (Mar. 1983), 725-736.
Susan Athey and Guido Imbens. 2015b. A measure of robustness to misspecification. American Economic Review. 105, 5 (May 2015), 476-80. DOI=10.1257/aer.p20151020.
Alberto Abadie, Susan Athey, Guido Imbens, and Jeffrey Wooldridge. 2014. Finite population standard errors. NBER Working Paper Number 20325. Retrieved May 30, 2015 from http://www.nber.org/papers/w20325.

Photo credit: Tourism Australia KDD 2015 Keynotes

Photo credit: Tourism Australia

KDD 2015 Keynotes