2015 SIGKDD Dissertation Award Winners
2015 SIGKDD Dissertation Award AwardSIGKDD Dissertation Awards (1 winner and 2 runner-ups)
ACM SIGKDD dissertation awards recognize outstanding work done by graduate students in the areas of data science, machine learning and data mining.
Selection Procedure: We received 15 nominations this year. After receiving the nominations, we invited leading experts to serve on the award selection committee from all over the world. Each dissertation was reviewed by at least 3 experts who helped group the dissertations into two competing groups. During the second phase, all members without COI were invited to rank the top 6 nominations.
Review Criteria:
- Relevance of the Dissertation to KDD
- Originality of the Main Ideas in the Dissertation
- Significance of Scientific Contributions
- Technical Depth and Soundness of Dissertation (including experimental methodologies, theoretical results, etc.)
- Overall Presentation and Readability of Dissertation (including organization, writing style and exposition, etc.)
WINNER
Mining Latent Entity Structures From Massive Unstructured and Interconnected Data.
Chi Wang (student) and Jiawei Han (advisor) at University of Illinois at Urbana-Champaign
Abstract: The emergence of the cloud, internet of things, social media etc. have enabled the incredibly fast and easy collection and storage of sheer amounts of data and information. Although database and data mining technologies have been successful at effective management and mining of structured data, there is a significant amount of value hidden in unstructured data, such as event logs, customer feedback, and social media content, is locked and awaits for discovery.
The thesis studies how to uncover semantically rich structures, such as topical hierarchies and relationships among entities, from massive data that may contain both unstructured text and interconnected entities. In general, the data are viewed as text-rich heterogeneous information networks, which allow the data to be text-only (unstructured data), network-only (interconnected data), or text plus links. Based on this view, the thesis lays down a mining framework of: (a) hierarchical topics and communities surrounding entities, (b) quality phrases to interpret these topics, and (c) relations among entities. Proposed methodologies are demonstrated in applications to a variety of domains, such as academic service, event log and news article explorer, and product review analytics. The methods produce quality topics, phrases and relations with no or little supervision. They are scalable and the runtime is orders of magnitude faster than alternatives in large datasets.
Runner-up: Modeling Large Social Networks in Context.
Qirong Ho (student) and Eric Xing (advisor) at Carnegie Mellon University
Abstract: Today's social and internet networks contain millions or even billions of nodes, and copious amounts of side information (context) such as text, attribute, temporal, image and video data. A thorough analysis of a social network should consider both the graph and the associated side information, yet we also expect the algorithm to execute in a reasonable amount of time on even the largest networks. Towards the goal of rich analysis on societal-scale networks, this thesis provides (1) modeling and algorithmic techniques for incorporating network context into existing network analysis algorithms based on statistical models, and (2) strategies for network data representation, model design, algorithm design and distributed multi-machine programming that, together, ensure scalability to large networks.
The methods presented herein combine the flexibility of statistical models with key ideas and empirical observations from the data mining and social networks communities, and are supported by distributed systems research for cluster computing. These efforts come together in a novel mixed-membership triangle motif model that scales to large networks with over 100 million nodes on just a few cluster machines, and can be readily extended to accommodate network context using the other techniques presented herein.
Runner-up: Computing Distrust in Social Media.
Jiliang Tang (Student) and Huan Liu (Advisor) at Arizona State University
Abstract: A myriad of social media services are emerging in recent years that allow people to communicate and express themselves conveniently and easily. The pervasive use of social media generates massive data at an unprecedented rate. It becomes increasingly difficult for online users to find relevant information or, in other words, exacerbates the information overload problem. Meanwhile, users in social media can be both passive content consumers and active content producers, causing the quality of user-generated content can vary dramatically from excellence to abuse or spam, which results in a problem of information credibility. Trust, providing evidence about with whom users can trust to share information and from whom users can accept information without additional verification, plays a crucial role in helping online users collect relevant and reliable information. It has been proven to be an effective way to mitigate information overload and credibility problems and has attracted increasing attention.
As the conceptual counterpart of trust, distrust could be as important as trust and its value has been widely recognized by social sciences in the physical world. However, little attention is paid on distrust in social media. Social media differs from the physical world: (1) its data is passively observed, large-scale, incomplete, noisy and embedded with rich heterogeneous sources; and (2) distrust is generally unavailable in social media. These unique properties of social media present novel challenges for computing distrust in social media: (1) passively observed social media data does not provide necessary information social scientists use to understand distrust, how can I understand distrust in social media? (2) distrust is usually invisible in social media, how can I make invisible distrust visible by leveraging unique properties of social media data? and (3) little is known about distrust and its role in social media applications, how can distrust help make difference in social media applications?
The chief objective of this dissertation is to figure out solutions to these challenges via innovative research and novel methods. In particular, computational tasks are designed to understand distrust, a innovative task, i.e., predicting distrust, is proposed with novel frameworks to make invisible distrust visible, and principled approaches are develop to apply distrust in social media applications. Since distrust is a special type of negative links, I demonstrate the generalization of properties and algorithms of distrust to negative links, i.e., generalizing findings of distrust, which greatly expands the boundaries of research of distrust and largely broadens its applications in social media.
Congratulations to all the outstanding students who were nominated and to the winners of this year.