The KDD-CUP-98 data set and the accompanying documentation are now available for general use with the following restrictions:
If you intend to use this data set for training or educational purposes, you must not reveal the name of the sponsor PVA (Paralyzed Veterans of America) to the trainees or students. You are allowed to say "a national veterans organization"...
readme. This list, listing the files in the FTP server and their contents.
instruct.txt . General instructions for the competition.
cup98doc.txt. This file, an overview and pointer to more detailed information about the competition.
cup98dic.txt. Data dictionary to accompany the analysis data set.
cup98que.txt. KDD-CUP questionnaire. PARTICIPANTS ARE REQUIRED TO FILL-OUT THE QUESTIONNAIRE and turn in with the results.
valtargt.readme. Describes the valtargt.txt file.
cup98lrn.zip PKZIP compressed raw LEARNING data set. (36.5M; 117.2M uncompressed)
cup98val.zip PKZIP compressed raw VALIDATION data set. (36.8M; 117.9M uncompressed)
cup98lrn.txt.Z UNIX COMPRESSed raw LEARNING data set. (36.6M; 117.2M uncompressed)
cup98val.txt.Z UNIX COMPRESSed raw VALIDATION data set. (36.9M; 117.9M uncompressed)
valtargt.txt. This file contains the target fields that were left out of the validation data set that was sent to the KDD CUP 98 participants. (1.1M)
Note: the datasets are also available in the UC Irvine KDD archive.