Microsoft Academic Graph (MAG)
Although we encourage the participants to use any publicly available information in this challenge, we do provide all the teams with the Microsoft Academic Graph (MAG). The Microsoft Academic Graph is a large and heterogeneous graph containing scientific publication records, citation relationships between publications, as well as authors, institutions, journal and conference "venues," and fields of study. This data is available as a set of zipped text files stored in Microsoft Azure blob storage and available via HTTP. The latest file size (zipped) is ~28.2GB. We also separate the zipped file into several smaller zipped files for easier downloading. See the screenshot below.
The data provided for the challenge can be accessed and downloaded from http://aka.ms/academicgraph. Please use the data version "2016-02-05" since it contains two specific files for this KDD Cup 2016.
Note: In order to emphasize an important technical challenge that is common in web-scale data collection and aggregation, the data released here have undergone only rudimentary processing, for example in areas of author and paper conflation/deduplication. This noisy yet realistic dataset can provide additional avenues for research in the big data arena.
We encourage competitors that require computational resource support to apply for the Microsoft Azure for Research Award. For details about the Microsoft Azure for Research Award submission process, go to http://www.windowsazurepass.com/research. Please include the hashtag #kddcup in your submission title for easier tracking.
In addition to the Microsoft Academic Graph data, we will also provide the Academic Knowledge API for teams who prefer to retrieve information using APIs. The API contains a weekly update of the most recent data, but the snapshot copy remains unchanged unless a further announcement is made.
Affiliation List to be Ranked
In the above dataset, we released a list of affiliation names and IDs for KDD Cup 2016. The participating teams only need to rank the affiliations that appear in the list.
Full Paper List in the past Five Years
The conference data in MAG contain many types of papers, including full research papers, industry track papers, short papers, poster papers, workshop papers, etc. It is nontrivial for the participants to identify the correct type for each conference paper. Since we will only evaluate the affiliations from accepted full research papers, we also provided the past five years’ (from 2011 to 2015) full research paper list of the eight targeted conferences.
External Data Sources
Teams are welcome to use external data in their approaches as long as that data is publicly accessible. The winners are required to disclose all the information they have used to generate the final ranking.