ADS Invited Speakers
- Home /
- ADS Invited Speakers
Towards An-easy-to Use AI Platform: Data, Algorithms and Systems
The superior accuracy of artificial intelligence (AI) and machine learning models over traditional methods has created a general perception that AI should be adopted for various business domains. Unfortunately, mass adoption of AI in practice is challenging because most domain experts have no background in AI. Developing AI applications involves multiple stages, namely data preparation, application modeling, and product deployment. Usability, efficiency and security of AI are becoming the barrier to democratizing AI. In this talk, I shall review the recent developments from the research and the open-source community towards an easy-to-use AI platform. Topics including data curation, AI software, automated machine learning, etc. will be covered. I will also introduce relevant features of Apache SINGA in assisting subject matter experts to develop their data science applications.
Bio: Dr. Wei Wang's research interests include machine learning system optimizations, machine learning for systems and multimedia applications. He is now a research engineer at TikTok, Singapore. He was an assistant professor at NUS School of Computing from 2017 to 2020. He got B.S. from Renmin University of China in 2011 and PhD from NUS in 2017. He is an initiator of Apache SINGA, and a vice president of Apache Software Foundation. He has served as an area chair for ICDCS2020, and PC member for ICDE, VLDB, SIGMOD, DASFFA, ACM Multimedia.
Revisiting Knowledge Graph Completion From a Practical Perspective
Little is known about the trustworthiness of predictions made by knowledge graph embedding (KGE) models. In this talk, I will first present my group’s work on investigating the calibration of KGE models, or the extent to which they output confidence scores that reflect the expected correctness of predicted knowledge graph triples. Going beyond the standard closed-world assumption, we evaluate the effectiveness of calibration techniques under the more realistic but challenging open-world assumption, in which unobserved predictions are not considered true or false until ground-truth labels are obtained. Via a case study of human-AI collaboration, we show that calibrated predictions can improve human performance in a knowledge graph completion task. Next, I will discuss a complementary task to triple prediction that aims to detect where in a knowledge graph information may be missing (not what is missing) and our efficient solution via knowledge graph summarization. To address the need for a solid benchmark in knowledge graph completion and provide a boost to research on this task, I will present CoDEx, a set of knowledge graph completion datasets extracted from Wikidata and Wikipedia that improve upon existing Freebase-based benchmarks in scope and level of difficulty. These datasets are part of the PyTorch-based library LibKGE. The talk will conclude with opportunities for future research.
Bio: Danai Koutra is a Morris Wellman Assistant Professor in Computer Science and Engineering at the University of Michigan, where she leads the Graph Exploration and Mining at Scale (GEMS) Lab, and an Associate Director of the Michigan Institute for Data Science (MIDAS). Her research focuses on practical and scalable methods for large-scale real networks, and has applications in neuroscience, organizational analytics, and social sciences. Her research interests include large-scale graph mining, graph summarization, knowledge graph mining, similarity and matching, and anomaly detection. She has won an NSF CAREER award, an ARO Young Investigator award, the 2020 SIGKDD Rising Star Award, research faculty awards from Google, Amazon, Facebook and Adobe, a Precision Health Investigator award, the 2016 ACM SIGKDD Dissertation award, and an honorable mention for the SCS Doctoral Dissertation Award (CMU). She holds one "rate-1" patent on bipartite graph alignment, and has multiple papers in top data mining conferences, including 8 award-winning papers. She is the Secretary of the SIAG on Data Science, an Associate Editor of ACM TKDD, and has served multiple times in the organizing committees of the major data mining conferences (including ACM SIGKDD, SIAM SDM, IEEE ICDM, ACM WSDM, ACM CIKM, ECML/PKDD). She has worked at IBM Hawthorne, Microsoft Research Redmond, and Technicolor Palo Alto/Los Altos. She earned her Ph.D. and M.S. in Computer Science from CMU in 2015 and her diploma in Electrical and Computer Engineering at the National Technical University of Athens in 2010.
Preserving Data Privacy in Federated Learning
Federated learning is an emerging paradigm that enables multiple organizations to jointly train a model without revealing their private data to each other. Nevertheless, preserving data privacy in federated learning is non-trivial, as there are various ways that sensitive information could be leaked or inferred. In this talk, I will review some of the recent progress made to enhance data privacy in federated learning, and highlight some of the challenges that lie ahead.
Bio: Xiaokui Xiao is an associate professor at the School of Computing, National University of Singapore. His research focuses on data privacy, with recent interests on privacy preserving machine learning and data sharing. He holds a Ph.D. in Computer Science from the Chinese University of Hong Kong.
Rethink e-Commerce Search
The quality of the search experience on an e-commerce site plays a critical role in customer conversion and the growth of the e-commerce business. In this talk, I will discuss the current status and challenges of product search. In particular, I will highlight the significant amount of effort it takes to create a high-quality product search engine using classical information retrieval methods. Then, I will discuss how recent advances in NLP and deep learning, especially the advent of large pre-trained language models, may change the status quo. While embedding-based retrieval has the potential to improve classical information retrieval methods, creating a machine learning-based, end-to-end system for general-purpose, web search is still extremely difficult. Nevertheless, I will argue that product search for e-commerce may prove to be an area where deep learning can create the first disruption to classical information retrieval systems.
Bio: Haixun Wang is currently an IEEE fellow, editor in chief of IEEE Data Engineering Bulletin, and a VP of Engineering and Distinguished Scientist at Instacart. Before Instacart, he was a VP of Engineering and Distinguished Scientist at WeWork, a Director of Natural Language Processing at Amazon, and he led the NLP team working on Query and Document Understanding at Facebook. From 2013 to 2015, he was with Google Research working on natural language processing. From 2009 to 2013, he led research in semantic search, graph data processing systems, and distributed query processing at Microsoft Research Asia. He had been a research staff member at IBM T. J. Watson Research Center from 2000 to 2009. He received the Ph.D. degree in Computer Science from the University of California, Los Angeles in 2000. He has published more than 150 research papers in referred international journals and conference proceedings. He served as PC Chairs of conferences such as SIGKDD'21, and he is on the editorial board of journals such as IEEE Transactions of Knowledge and Data Engineering (TKDE) and Journal of Computer Science and Technology (JCST). He won the best paper award in ICDE 2015, 10-year best paper award in ICDM 2013, and the best paper award of ER 2009.
Data-Driven Approach to Creativity in Design with Human-Engaged AI
Understanding users -- their characteristics, needs, tastes, and contexts -- is a critical foundation for creativity in user-centered design. Conventional methods that collect user information and feedback through surveys and interviews are costly, and the results may not be easily applied to other designs. Designers thus rely on their intuitions and experiences to solve design problems. Advances in AI and big data technologies open the door to data-driven design. Through several case studies from different fields of design, we demonstrate the opportunities and challenges of involving human-engaged AI in the preparation, incubation, illumination, and revision of design, particularly in the aspects of task, data, model, and application. More specifically, we showcase that with designers and/or users in the loop, applying computational strategies to the design process enables systematic inspection of design space, broadened association of design materials, quantitative evaluation of alternatives of creative solutions, and prediction of user perception. We further share our exploration in supporting the scalability, adaptability, transferability, and explainability of our data-driven approach to creating engaging designs.
Bio: Tenure Track Assistant Professor, Human-Computer Interaction Initiative Xiaojuan Ma is an assistant professor of Human-Computer Interaction (HCI) at the Department of Computer Science and Engineering (CSE), Hong Kong University of Science and Technology (HKUST). She received the Ph.D. degree in Computer Science at Princeton University. She was a post-doctoral researcher at the Human-Computer Interaction Institute (HCII) of Carnegie Mellon University (CMU), and before that a research fellow in the National University of Singapore (NUS) in the Information Systems department. Before joining HKUST, she was a researcher of Human-Computer Interaction at Noah's Ark Lab, Huawei Tech. Investment Co., Ltd. in Hong Kong. Her background is in Human-Computer Interaction. She is particularly interested in data-driven human-engaged AI (HEAI) and Human-Engaged Computing (HEC) in domains including but not limited to education, health, and design.
Challenges and solutions in industry scale data and AI systems
The recent years have witnessed continued adoption of AI technology in industry applications. In addition, the recent AI applications have unified with conventional big data analytics platforms. In this talk, we will review the needs and challenges from Alibaba's big data and AI applications, and introduce our recent developments in building real-time data warehouses, real-time compute, AI infrastructure, and related platforms that support both E-commerce and cloud applications.
Bio: Yangqing Jia is currently Vice President and Senior Fellow of Alibaba Group. Prior to Alibaba, Yangqing served as Director of AI Infrastructure at Faceboook, creating and supporting the largescale AI platform and research. Prior to that he was research scientist at Google Brain, having contributed to state of the art AI solutions such as GoogleNet, Tensorflow and mobile AI. He has years of experience in open source AI solutions and standards, with prior work including Caffe, TensorFlow, PyTorch 1.0 and ONNX. He obtained his PhD degree from University of California, Berkeley.
The New DBfication of ML/AI and Reigniting the Database - Data Mining Nexus
"The recent boom in ML/AI applications has brought into sharp focus the pressing need for tackling the concerns of scalability, usability, and manageability across the entire lifecycle of ML/AI applications. The ML, AI, and data mining worlds have long studied the concerns of accuracy, automation, etc. from theoretical and algorithmic vantage points. But to truly democratize ML/AI, the vantage point of building and deploying practical systems is equally critical. In this talk, I will make the case that it is high time for the worlds of ML, AI, and data mining to (re)ignite their nexus with another world that exemplifies successful democratization of data technology: databases. I will show how new bridges rooted in the principles, techniques, and tools of the database world are helping tackle the above pressing concerns and in turn, posing new research questions to the worlds of ML, AI, and data mining. As case studies of this nexus, I will describe two lines of work from my group: query optimization for ML systems and benchmarking data preparation in AutoML platforms. I will conclude with my thoughts on community mechanisms to foster more such bridges between research worlds and between research and practice. "
Bio: Arun Kumar is an Assistant Professor in the Department of Computer Science and Engineering and the Halicioglu Data Science Institute at the University of California, San Diego. He is a member of the Database Lab and Center for Networked Systems and an affiliate member of the AI Group. His primary research interests are in data management and systems for machine learning/artificial intelligence-based data analytics. Systems and ideas based on his research have been released as part of the Apache MADlib open-source library, shipped as part of products from Cloudera, IBM, Oracle, and Pivotal, and used internally by Facebook, Google, LogicBlox, Microsoft, and other companies. He is a recipient of two SIGMOD research paper awards, a SIGMOD Research Highlight Award, three distinguished reviewer awards from SIGMOD/VLDB, the PhD dissertation award from UW-Madison CS, the IEEE TCDE Rising Star Award, an NSF CAREER Award, a Hellman Fellowship, a UCSD oSTEM Faculty of the Year Award, and research award gifts from Amazon, Google, Oracle, and VMware.
Is Your Model Prepared for Changes?
By default, Machine Learning training is optimized for deployment on data distributions that match the training distribution. As machine learning gets entrenched in our way of life, our models need to embrace the inevitability of change. Changes come in many different forms. Some changes are certain and the changed value is predictable --- for example, time of deployment. Surprisingly, even for such a predictable change current training objectives show sub-optimal foresight. For other changes, the direction of change may be known during training, but the changed value is unknown. In such cases, the model needs to be trained for robustness to face new values. What training objectives can make your trained model ready for such changes? This is a topic of much active research. We will distill ideas that can be deployed in real systems today to train models better prepared for change during deployment.
Bio: Sunita Sarawagi researches in the fields of databases and machine learning. She is institute chair professor at IIT Bombay. She got her PhD in databases from the University of California at Berkeley and a bachelors degree from IIT Kharagpur. She has also worked at Google Research (2014-2016), CMU (2004), and IBM Almaden Research Center (1996-1999). She was awarded the Infosys Prize in 2019 for Engineering and Computer Science, and the distinguished Alumnus award from IIT Kharagpur. She has several publications including best paper awards at ACM SIGMOD, VLDB, ICDM, NIPS, and ICML conferences. She has served on the board of directors of the ACM SIGKDD and VLDB foundation. She was program chair for the ACM SIGKDD 2008 conference, research track co-chair for the VLDB 2011 conference and has served as program committee member for SIGMOD, VLDB, SIGKDD, ICDE, and ICML conferences, and on the editorial boards of the ACM TODS and ACM TKDD journals.
Can machines help humans generate new ideas and create more Eureka moments?
Human ideation is inherently limited: our biases and “in the box” thinking often makes us blind to intrinsic and extrinsic connections to related systems, and even to separate domains of knowledge. It’s becoming increasingly impossible to read and comprehend the vast volumes of new scientific research, and compile it into a coherent model of the underlying problem we are trying to solve. Imagine a machine that generates millions of new ideas every minute, writes the code to test them against available data and knowledge, and then presents you with only the best hypotheses. Or another machine that allows you to find answers to complex questions and connect the dots between related bits of knowledge by mining billions of web pages, including scientific articles, patents, and news. Science fiction? Not really. We have built these machines. They are being used on a daily basis, for high-stakes business challenges across multiple industries including pharmaceutical, retail, financial services and energy, and across use cases such as drug-discovery, fuel consumption optimization, customer retention, inventory management, green energy, and more. In this talk we will outline the foundations and operating principles of these engines, and share examples of how they are incorporated in production systems, to enable applications that were not possible otherwise.
Bio: Ron is a data-science entrepreneur with extensive experience in applications of machine learning and artificial intelligence. As CTO and co-founder of Spark Beyond, Ron is pursuing disruptive directions in knowledge-driven predictive analytics. Prior to SparkBeyond Ron was the VP for Recommendation Technologies at Outbrain, co-founded the Microsoft Israel Innovation Labs, and led product marketing at LivePerson. He is the inventor of several patents in the domains of business intelligence, search and advertising, social analytics, and image processing. Dr. Karidi received his B.Sc. and Ph.D. in mathematics from Tel-Aviv University, and was an Assistant Professor of Mathematics at Stanford University.
What’s Next in Infrastructure for Machine Learning?
Despite incredible advances in machine learning, building production ML applications is expensive and difficult because of their computational cost, data cost, and complex failure modes. I’ll discuss these challenges from two perspectives: the Stanford DAWN research group and my position at Databricks, offering a data and ML platform for enterprises. I’ll cover two emerging ideas to help address these challenges. The first is software development and operations platforms designed specifically for ML, often called “ML platforms”, that standardize the interfaces used in ML applications to make them easier to build and maintain. A lot of production use of ML is already through such platforms, and there are many open questions in designing them. The second idea is model designs that are more “infrastructure-friendly” and “ops-friendly” by design. As a concrete example, I will discuss retrieval-oriented NLP models such as Stanford's ColBERT that query documents from a corpus to perform tasks such as question-answering, which gives multiple advantages, including lower computational cost, easier interpretability, and very low-cost updates to the model’s “knowledge”. I hope this talk gives both researchers and practitioners a good idea of what it takes to turn ML results into widely usable, productionizable systems.
Bio: Matei Zaharia is an Assistant Professor of Computer Science at Stanford and Cofounder and Chief Technologist at Databricks. He started the Apache Spark open source project during his PhD at UC Berkeley in 2009 and the MLflow open source project at Databricks, and has helped design other widely used data and AI systems software including Delta Lake and Apache Mesos. At Stanford, he is a co-PI of the DAWN lab working on infrastructure for machine learning, data management and cloud computing. Matei’s research was recognized through the 2014 ACM Doctoral Dissertation Award for the best PhD dissertation in computer science, an NSF CAREER Award, and the US Presidential Early Career Award for Scientists and Engineers (PECASE).
Data and AI driven approach to Career Mobility and Talent Management
Long term prediction for career trajectories of workers is a difficult task. Current unforeseen economic challenges and job displacements only add to the frustration. The talk will focus on building sustainable solutions with machine learning algorithms to help talents manage and tailor their career path through the complexities of the job market. The talk reviews efforts in the research and development community to educate and empower the workers with personalized analytics tools to understand the impact of their choices and their long-term career outcome. Apart from the worker’s employment and educational history, other factors like their career interests, training options to skill themselves and market trends are also taken into account while providing personalized recommendations. Reviews will be conducted on how to help the labor force explore the desired career goal and guide people through the skills and experience levels required to attain the goal. The research also explores the methods on deriving optimal trajectory by leveraging historical jobs data, market trends and AI-driven user-specific predictions.
Bio: Dr. Wang has more than 20 years of experience in the Machine Learning research and development field, has held a number of positions in well-known companies. She is currently the VP of Data Science and Machine Learning at Korn Ferry, a global leader in HR consulting and recruiting business. Dr. Wang will be leveraging Korn Ferry’s rich data set and benchmarks to build AI solutions for Talent Management and Talent Analytics applications through the Korn Ferry intelligence cloud platform. Before Korn Ferry, she worked for Riviera Partners, where she was the Vice President of data science and engineering. There, she built the firm’s data science team and machine learning platforms to bring efficiency to recruiting efforts. Before Riviera, Dr. Wang was in leadership positions to drive fast growth for Zillow and Microsoft. She is an author of numerous publications and patents. She has recently received a National Science Foundation grant to research methods to improve the National Talent Ecosystem. Dr. Wang holds Doctorate Degree in Electrical Engineering from Tulane University.
Using Machine Learning to Amplify Scientific Work in Conservation and Ocean Health
The machine learning impact team, currently at Vulcan Inc. but soon to be part of the Allen Institute for AI (AI2), strives to make and leave the world a better place by partnering with scientists to amplify their work in the conservation and ocean health space. We seek to reduce the effort involved in going from data acquisition to scientific insight and actionable results. In this talk, I will present and discuss the team’s important work, including applying and developing machine learning techniques towards automating health metrics for Southern Resident Killer Whales in partnership with SR3, identifying bottlenose dolphin signature whistles in partnership with researchers at the Woods Hole Oceanographic Institute and classifying abiotic/biotic noise in partnership with researchers at the Applied Physics Laboratory within the University of Washington.
Bio: Dr. Sam McKennoch leads Vulcan’s machine learning impact team, driving a portfolio of conservation and ocean health projects with a number of science partners. Prior to Vulcan, Sam worked in a variety of research and development roles over the last two decades, primarily within the medical devices space. Sam has a PhD in Electrical Engineering with a focus on Computational Neuroscience from the University of Washington.
Big Data Analytics in Healthcare
Increasing demand and costs for healthcare, exacerbated by ageing populations and a great shortage of doctors, are serious concerns worldwide. Consequently, this has generated a great amount of motivation in providing better healthcare through smarter healthcare systems empowered by big data. Management and processing of healthcare data are challenging due to various factors that are inherent in the data itself such as high complexity, irregularity, sparsity and privacy etc. In this talk, I shall discuss the challenges in designing algorithms and systems for healthcare data analytics, describe several key steps for processing big healthcare data, and present several detailed technologies from both system and algorithm perspectives in our healthcare data management and analytics framework.
Bio: Meihui is currently a professor of Beijing Institute of Technology (BIT). Before joining BIT, she was an Assistant Professor at the Singapore University of Technology and Design. She obtained her PhD from the National University of Singapore. Her main research interests include Big Data Management and Analytics, Modern Database Systems, Blockchain Systems and AI. She is a winner of 2020 VLDB Early Career Research Contribution Award and 2019 CCF-IEEE CS Young Scientist Award. Meihui has served as a PC member of top international conferences such as ACM SIGMOD, VLDB and KDD. She serves as Research Track Associate Editor of VLDB 2018, VLDB 2019, VLDB 2020 and SIGMOD 2021, PC Vice-Chair of ICDE 2018, Co-chair of DASFAA 2017 and EDBT 2022 Demo Track. She is serving as Associate Editor for ACM Transactions on Data Science and Survey Track Editor of Distributed and Parallel Databases. She is a trustee of VLDB endowment.
Learning to Solve the Traveling Salesman Problem
The Traveling Salesman Problem (TSP) is the most popular and most studied combinatorial problem, starting with von Neumann in 1951. It has driven the discovery of several optimization techniques such as cutting planes, branch-and-bound, local search, Lagrangian relaxation, and simulated annealing. The last five years have seen the emergence of promising techniques where (graph) neural networks have been capable to learn new combinatorial algorithms. The main question is whether deep learning can learn better heuristics from data, i.e. replacing human-engineered heuristics? This is appealing because developing algorithms to tackle NP-hard problems may require years of research, and many industry problems are combinatorial by nature. In this project, we propose to adapt the recent successful Transformer architecture originally developed for natural language processing to the combinatorial TSP. Training is done by reinforcement learning, hence without TSP training solutions, and decoding uses beam search. We report improved performances over recent learned heuristics.
Bio: Xavier Bresson is an Associate Professor in the Department of Computer Science at the National University of Singapore (NUS). His research focuses on Graph Deep Learning, a new framework that combines graph theory and neural network techniques to tackle complex data domains. In 2016, he received the US$2.5M NRF Fellowship, the largest individual grant in Singapore, to develop this new framework. He was also awarded several research grants in the U.S. and Hong Kong. He co-authored one of the most cited works in this field, and he has recently introduced with Yoshua Bengio a benchmark that evaluates graph neural network architectures. He has organized several workshops and tutorials on graph deep learning such as the recent IPAM'21 workshop on "Deep Learning and Combinatorial Optimization", the MLSys'21 workshop on "Graph Neural Networks and Systems", the IPAM'19 and IPAM'18 workshops on "New Deep Learning Techniques", and the NeurIPS 2017, CVPR'17 and SIAM'18 tutorials on "Geometric Deep Learning on Graphs and Manifolds". He has been a regular invited speaker at universities and companies to share his work. He has also been a speaker at the AAAI'21 and ICML'20 workshops on "Graph Representation Learning", and the ICLR'20 workshop on "Deep Neural Models and Differential Equations". He has been teaching graduate courses on Graph Neural Networks at NTU, and as guest lecturer for Yann LeCun's course at NYU.
Learning transferable human behaviour representations from sensor data
Personalised behaviour models are key enablers of intelligent assistant technologies and proactive recommender systems. On an aggregate level, insights into human behaviours are critical for improving operation efficiency, individual and organisational productivity, and quality of life in cities. The proliferation of sensors and Internet of Things leads to new opportunities and challenges for modelling human behaviours. However, most representation learning techniques require a large amount of well-labelled training sets to achieve high performance. Due to the high expense of labelling human and/or system behaviours, approaches that require minimal to no labelled data are becoming more favourable. This motivated us to explore techniques that are data-efficient learning techniques to achieve efficient and compact representations. In this talk, I will firstly present our unsupervised approaches to handle large-scale mutivariate sensor data from heterogeneous sources, prior to modelling them further with the rich contextual signals obtained from the environment. I will cover recent works on self-supervised learning for change point detection and anomaly detection, applicable to various tasks. Examples will also be drawn from our evidential projects that have leveraged transfer learning and few-shot learning approaches on IoT and sensor data for applications in smart cities, smart buildings, and public health monitoring.
Bio: Flora Salim is a Professor in the School of Computing Technologies, RMIT University, Melbourne, Australia, the co-Deputy Director of RMIT Centre for Information Discovery and Data Analytics (CIDDA), and an Associate Investigator of ARC Centre of Excellence in Automated Decision Making and Society. Flora leads the IoT Analytics node, or the Context Recognition and Urban Intelligence (CRUISE) group. Her research interests include human behaviour modelling, time-series and spatio-temporal data mining, machine learning on stream and sensor data, ubiquitous computing, and smart cities. She was a Humboldt-Bayer Fellow, Humboldt Fellow -experienced researcher, Victoria Fellow 2018. She was the recipient of the the RMIT Vice-Chancellor's Award for Research Excellence–Early Career Researcher 2016; the RMIT Award for Research Impact - Technology 2018; Australian Research Council (ARC) Postdoctoral Research Industry Fellowship (2012-2015). She serves as an Associate Editor of the PACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT), an Area Editor of Pervasive and Mobile Computing journal, and a Steering Committee member of ACM UbiComp. Prof. Salim has received several ARC Linkage, a Discovery, and numerous international industry grants, including from Microsoft Research, Northrop Grumman Corporations, and Qatar National Research Funds in the domain of intelligent assistants and smart cities.
TransFAT: Translating Fairness, Accountability, and Transparency into Data Science Practice
Data science technology promises to improve people's lives, accelerate scientific discovery and innovation, and bring about positive societal change. Yet, if not used responsibly, this same technology can reinforce inequity, limit accountability, and infringe on the privacy of individuals. In my talk I will give an overview of the "Data, Responsibly" project that aims to operationalize ethics and legal compliance in data science systems. In particular, I will speak about my involvement in efforts to regulate the use of data science and AI in New York City, and about the imperative to establish a broad and inclusive educational agenda around responsible data science.
Bio: Julia Stoyanovich is an Assistant Professor of Computer Science & Engineering, and of Data Science at New York University (NYU). She directs the Center for Responsible AI at NYU, a hub for interdisciplinary research, public education, and advocacy that aims to make responsible AI synonymous with AI. Julia's research focuses on responsible data management and analysis: on operationalizing fairness, diversity, transparency, and data protection in all stages of the data science lifecycle. Julia developed and has been teaching courses on Responsible Data Science at NYU, and is a co-creator of an award-winning comic book series on this topic. In addition to data ethics, Julia works on the management and analysis of preference data, and on querying large evolving graphs. She holds M.S. and Ph.D. degrees in Computer Science from Columbia University, and a B.S. in Computer Science and in Mathematics & Statistics from the University of Massachusetts at Amherst. Julia is a recipient of an NSF CAREER award and of an NSF/CRA CI Fellowship.
Semantic Scholar - Tera-scale NLP to Accelerate Scientific Research
Semantic Scholar (S2) is a 50 person effort at the Allen Institute for Artificial Intelligence that drives a website used by about 100M people each year. Our mission is to accelerate the progress of scientific research with augmented intelligence - advanced tools that make it easier to find relevant research, digest it quickly, and make connections between different problems and approaches. This talk will survey some of the data-mining and NLP advances underlying S2, from the identification of emerging scientific concepts to extreme abstractive summarization, full-document understanding, and fact checking.
Bio: Daniel S. Weld is Thomas J. Cable / WRF Professor in the Paul G. Allen School of Computer Science & Engineering and manages the Semantic Scholar research group at the Allen Institute of Artificial Intelligence. He received bachelor's degrees in both Computer Science and Biochemistry at Yale University in 1982 and landed a Ph.D. from the MIT Artificial Intelligence Lab in 1988. Weld received a Presidential Young Investigator's award in 1989, an ONR Young Investigator's award in 1990, was named AAAI Fellow in 1999 and deemed ACM Fellow in 2005. Dan is an active entrepreneur with many patents and technology licenses. He co-founded Netbot Incorporated, creator of Jango Shopping Search (acquired by Excite), AdRelevance, a monitoring service for internet advertising (acquired by Nielsen NetRatings), and data integration company Nimble Technology (acquired by Actuate). Weld is also a Venture Partner at the Madrona Venture Group
Google Dataset Search: Building an open ecosystem for dataset discovery
There are thousands of data repositories on the Web, providing access to millions of datasets. National and regional governments, scientific publishers and consortia, commercial data providers, and others publish data for fields ranging from social science to life science to high-energy physics to climate science and more. Access to this data is critical to facilitating reproducibility of research results, enabling scientists to build on others’ work, and providing data journalists easier access to information and its provenance. In this talk, we will discuss Dataset Search by Google, which provides search capabilities over potentially all dataset repositories on the Web. We will talk about the open ecosystem for describing datasets that we hope to encourage and what we have learned by analyzing the corpus of more than 30 million dataset descriptions.
Bio: Natasha Noy is a senior staff scientist at Google Research where she works on making structured data accessible and useful. She leads the team building Dataset Search, a search engine for all the datasets on the Web. Prior to joining Google, she worked at Stanford Center for Biomedical Informatics Research where she made major contributions in the areas of ontology development and alignment, and collaborative ontology engineering. Dr. Noy is a Fellow of the Association for the Advancement of Artificial Intelligence (AAAI). She has served as the President of the Semantic Web Science Association from 2011 to 2017.
Recent Advances in AI to Detect Online Harmful Content
Online social networks provide platforms for people to interact with each other, share information, as well as consume latest news information. However, these platforms are also used for malicious purposes, such as posting hate speech, distributing misinformation, conducting human trafficking, selling illegal drugs, etc. In this talk, we present how recent AI advances helped in better detecting those harmful contents. Topics include but not limited to self-supervision techniques which reduced the needs of human labeling, multilingual advances to address low-resource language problems, more holistic approaches to deal with multimodal contents, as well as some major challenges that still lie ahead.
Bio: Dr. Hao Ma is currently a director at Facebook AI, where his team is building advanced technologies to make Facebook a better and safer place. Prior to joining Facebook AI, Hao was a principle research manager at Microsoft Research, where he spent most of his time in improving search experience with more semantic information. Hao’s research interest lies in the intersection of natural language understanding, recommender systems, machine learning, information retrieval and social network analysis. Hao is a recipient of two Test-of-Time awards, and he received his Ph.D. degree in Computer Science from The Chinese University of Hong Kong.
Applied Data Science for Predictive Time Series Analytics
Predictive time series analytics forms a key capability in realizing the vision of trustworthy and performant AI systems in a wide range of application domains from autonomous driving to cloud-native computing systems. Time series data helps capture rich temporal phenomena such as anomalous data trends or faulty system behavior, yet is challenging to analyze due to its high volume and complexity. This talk will highlight some of these challenges with real-world examples and introduce a selection of novel data science tools and techniques for tackling them in practice.
Bio: Nesime Tatbul is a senior research scientist at Intel Labs and MIT CSAIL, serving as an industry co-PI for MIT's Data Systems and AI Lab jointly funded by Intel, Google, and Microsoft. Previously, she served on the computer science faculty of ETH Zurich after receiving a Ph.D. degree from Brown University. Her research interests are broadly in large-scale data management systems and modern data-intensive applications, with a recent focus on advanced time series analytics and learned data systems. Her work has been highly cited and recognized by distinctions at ACM SIGMOD, ACM DEBS, VLDB, and NeurIPS conferences. She has been an active member of the database research community for 20+ years, serving in various roles for the VLDB Endowment, ACM SIGMOD, and other organizations.