SIGKDD : KDD Cup 2010 : Student performance evaluation

Post-Competition Questions

What do the 1’s and 0’s mean on the leaderboard for development data sets?

This is a bug that was an effect of adding “cup scoring” and “leaderboard scoring”. You can ignore the “cup scoring” for development data sets; “leaderboard scoring” is the only meaningful score for development data sets, and it’s a score based on the entire submission.

Will you be making the Challenge Test Set labels available now that the contest has ended?

We have just re-opened the submissions and leaderboard to allow people to continue working on their algorithms. Sometime later, we hope to release the full master data sets, but have no definite plans at this time.

Why are there now two alternate views of the Leaderboard?

During the KDD Cup Workshop, some participants suggested that we change the way the leaderboard works so that we display the same type of scores that were used to determine the competition winners (by validating most of the predictions instead of a small portion). Recall that during the competition, the evaluation process that powered the Leaderboard only looked at a small portion of participant prediction files; a much larger portion was used to determine the winners. We thought this suggestion was a great idea, but we didn’t want to change the Leaderboard and individual submission pages in such a way that they no longer reflected the scores and ranking given to participants at the end of the competition. As a solution, we’ve added a toggle to the Leaderboard and individual submission pages allowing you to view either the Cup Scoring, where a majority of the prediction file is used to score the entry, or Leaderboard Scoring, where a small portion of the prediction file is used to score the entry. We calculate both, so you can toggle between them. Cup Scoring is more accurate, but Leaderboard scoring is what was used during the competition to power the Leaderboard.

Registration and General Competition

The leaderboard just went blank, and it says there are no submissions. Is it safe to submit? Y

es, it is still safe to submit, and the leaderboard has not been lost. There is a bug in our system that is causing the leaderboard to appear empty under high load. The submission and the leaderboard data are still stored properly.

I didn’t get a verification email. What should I do?

Contact us and tell us that you registered but received no email. We’ll verify your account. Please do not create a new account.

Where can I find more info about the competition?

Professors at WPI want to help undergrads compete. Professors Neil Heffernan, Ryan Baker, Joe Beck, and Carolina Ruiz of Worcester Polytechnic Institute will be giving online webinar lectures at WPI and posting them online for undergrads across the country. They will give suggestions on how to do well on the KDD Cup this year. They’re also going to give away some tools. See their website for more info: http://teacherwiki.assistment.org/wiki/index.php/KDD

When can I register and download data?

You can register April 1, 2010 at 2pm EDT, and download data as soon as you’ve registered.

Can team members come from different organizations?

Team members can definitely come from different organizations. The only restriction on teams is that any person can only participate in a single team. See the Rules for more information about participation.

What’s the difference between a student team and a non-student team?

A team can be either a student team (eligible for student-team prizes) or not a student team (eligible for travel awards). In a student team, a professor should be cited appropriately, but in the spirit of the competition, student teams should consist primarily of student work. We will ask for participants to state whether they are a student team prior to the end of the competition. Our sponsors have provided cash prizes for student teams.

Does this mean there won’t be any monetary prizes for industry participants other than travel awards?

Rather than splitting the prizes among student and non-student teams, we decided that we will only award monetary prizes to students this year, since the cash prize would make the biggest difference for them. This way we can offer more consistent cash prizes, while still offering a sizable travel award for all top placed participants. For an industry participant, we figure that being a top performer in the KDD Cup is worth way more in PR than the cash prize anyway. Plus, our experience has been that, in some cases, providing cash prizes to industry participants creates more problems on their side than it’s worth. We hope the lack of prizes for industry won’t keep you from participating in the KDD Cup this year!

Data Format

What are the differences between development and challenge data sets?

Development data sets are provided for familiarizing yourself with the format and developing your learning model. Using them is optional, and your predictions on these data sets will not count toward determining the winner of the competition. Development data sets differ from challenge sets in that the actual student performance values for the prediction column, “Correct First Attempt”, are provided for all steps - see the file ending in “_master.txt”. Challenge data sets will be made available on April 8 at 2pm EDT. Predictions on these data sets will count toward determining the winner of the competition. In each of these two data sets, you’ll be asked to provide predictions in the column “Correct First Attempt” for a subset of the steps. For more information on which steps these will be, see the bottom of our Data page.

What are the different files in each data set ZIP file I downloaded?

[dataset]_train.txt: This file contains all training rows for the data set, but no test rows. Use this file to train your learning model.
[dataset]_test.txt: This file contains only test rows for the data set. You will make predictions for all of these rows in the column Correct First Attempt, but submit only the columns Row and Correct First Attempt - see the submission file format, [dataset].txt, below.
[dataset]_submission.txt: A submission file composed of only 2 columns, Row and Correct First Attempt. This shows the format of a valid submission file in which you’d provide prediction values (probabilities) in the second column. Note that only test rows are included, not training rows. (For development sets, this file is just [dataset].txt)
[dataset]_master.txt: Only included for development data sets, this file shows all of the actual student performance values for test rows in the prediction column, Correct First Attempt. It also includes values for the various time columns and Incorrects/Hints/Corrects columns, which are empty in the test file. These are documented on the Data page. It’s up to you how you’d like to use the development data sets and the included actual student performance values.

Your description of the data format states that there should be one test problem (composed of multiple step rows) per student and unit, but I see more than one problem per student-unit in a development data set. I also see only a single test row per student-unit, not the multiple rows you describe. Why?

There are some errors in the development data sets. We might update the development data sets to fix this issue, but we will make sure this doesn’t happen in the challenge data sets: expect only one problem per student-unit in any challenge data set, composed of one or more step rows. The purpose of releasing development data sets was to find any issues in the data and to allow participants to familiarize themselves with the data structure.

The file algebra_2006_2007_train.txt appears to end on a partial record that has only 11 fields and no line terminator.

This is a bug in that development data set. We plan to update this development data set. - April 5, 2010 2:15pm

Update: An updated version of the “Algebra 2006-2007” data set with a fixed final row is available on the Data. - April 5, 2010 4:45pm

Do the steps for a given problem have to be completed in a unique order?

No. For a given problem, the order of the steps completed could vary across students or it could be the same. The set of steps completed could also vary or be the same—in some problems, some steps are optional. Also note that for some steps, the correct answer the student enters could vary across students (e.g., the tutor might have accepted “35” and “35.0”, or something more dissimilar).

I see that there are some problems which have different number of steps when attempted by different students. For example, in “Unit ES_05”, problem “EG61” has been attempted by multiple students but each time the number of steps differs. Does the number of steps in a given instantiation of a problem depend on the performance of the student on the earlier steps of the same problem or even previously attempted problems? Or is it “pre-decided” by the tutor program (for a given instantiation) and a student has to just attempt the steps sent forth by it?

The number of steps you see for a problem could vary across students based on performance within a problem (e.g., the tutor might present extra steps based on errors made) or just based on the approach the student took (e.g., the student might skip optional steps). I don’t know of an example where performance on a prior problem affects the number of steps in a later problem, in the sense that the tutor would control access to steps. When looking at solver or grapher data (the unit referenced is an equation solver unit), the number of steps almost always varies because these are somewhat exploratory environments where non-useful steps (steps that aren’t closer to the solution) are allowed. So with respect to steps, some tutored problems are more pre-decided than others.

One of the fields is Corrects which is described as “total correct attempts by the student for the step. (Only increases if the step is encountered more than once.)”. Why would the same step come up more than once in a problem?

This can happen in a few cases. One case is where the tutoring software allows the student to enter something and receive feedback, but does not lock the widget upon receiving the correct response. An example is a problem that includes an interactive graph where the student can set the scale of the graph: they can set the scale of the graph as many times as they’d like so long as they provide valid numbers. Each valid attempt is a “correct” one. Another case is in equation solving problems. In these problems, the student could transform the equation any number of times. The “step” is represented as the current equation, so if a transformation leads to an equation they’ve seen already, the number of “corrects” would increase by 1. Other cases probably exist in the data.

Is each step uniquely identified by problem hierarchy, problem name, and step name? Or is just problem name and step name needed?

A step row is uniquely identified by this hierarchy: Student > Problem Hierarchy > Problem Name > Problem View > Step

Update: A bug in the data prevents this from always being the case. - May 7, 2010 12:00pm

How should I interpret these long knowledge components I often see? For example:

[SkillRule: Eliminate Parens;

{CLT nested; CLT nested, parens;

Distribute Mult right;

Distribute Mult left;

(+/-x ±a)/b=c, mult;

(+/-x ±a)*b=c, div;

[var expr]/[const expr] = [const expr], multiply;

Distribute Division left;

Distribute Division right;

Distribute both mult left;

Distribute both mult right;

Distribute both divide left;

Distribute both divide right;

Distribute subex}]

The skill here is “Eliminate Parens”. The items listed in the curly braces {CLT nested; CLT nested, parens; Distribute Mult right; ... etc.} is just a static list of every operation that could possibly trigger this skill. The order of things is not at all meaningful. There is no hierarchical or sequential structure here; it is simply how it was listed by the tutor developer. In fact, any element in this list may not be relevant to the particular problem the student was operating on. This is simply a list of all possible operations that might possibly be involved in triggering the skill. In this case, there are a lot of operations which could trigger the “Eliminate Parens” skill.

Can you tell me more about the KC models in the challenge data sets?

The extra columns mean that for the challenge data sets, there are additional KC models. A KC model is a list of mappings between each step and one or more knowledge components; it is also known as a Transfer Model or a Skill Model. Each KC model is represented by two columns (KC and Opportunity). Within a model, there can be multiple KCs associated with a step. When there are, they are separated by "~~". The corresponding opportunity numbers are also separated by "~~", and are given in the same order, so each KC has an opportunity count for that step. A simple answer about these individual models is that Rules is a more fine-grained categorization of similar steps and KTracedSkills is a more coarse-grained categorization. ("Rules" corresponds with the production rules in the tutor, which get grouped into "meta-productions" that correspond 1:1 with the KTracedSkills. I believe there is a strict one-to-many mapping between each of the KTracedSkills and instances of the Rules (but there may be exceptions).) The KTracedSkills level is used by the tutor to select future problems and is presumed to be the level at which students are learning and transferring their knowledge from one task experience (step) to another related one. That presumption is not always born out by the data and the fine-grained Rules KC model may provide clues to a better clustering of steps to predict transfer of learning.

To summarize in a slightly different way:

KTracedSkills - these are the skills that are knowledge-traced (i.e. the ones that appear on the tutor's skillometer).

SubSkills - these are the skills identified by the system, whether or not they are being traced.

Rules - these are the actual rule names used to determine skills. The distinction between these and "SubSkills" is that this KC model would include the actual model rules, not the meta-productions, while the SubSkills would be based on meta-productions.

What does it mean for the current step's end time to be later than the start time of the following step?

In general, steps can be interleaved: a student can start working on one step, complete another step, and return to the first step. In rows 81 and 82 of algebra_2005_2006_train.txt, this explanation isn't a great fit because row 81 had no incorrect attempts, meaning that somehow the student "started" the FinalAnswer step, and then, 105 seconds later, completed it. But perhaps FinalAnswer is representing a class of "final answers", since there are 4 "Corrects" for that step. That would mean that a step called FinalAnswer was available for the student to solve correctly 4 times (which they did without error). See also our FAQ entry where we mention "equation solving problems".

Can you give us some suggestions about how to understand some step names? (E.g. "XR1" and "R7C1") R7C1 is "row 7 column 1" and indicates where this student input appeared within the table that students are working within. We're not sure about XR1, but it may be better inferred from the other step names in the same problem.

Is there any manual of the tutor system? It will be very helpful for us to understand the system and data sets as we have never used this tutoring system before.

To learn more about the tutor, see some of the papers about the Algebra tutor and some about relevant units in the Bridge to Algebra tutor. These include the following: Ritter, Steven; Haverty, Lisa; Koedinger, Kenneth; Hadley, William; Corbett, Albert (2008). Integrating intelligent software tutors with the math classroom. G. Blume and K. Heid (Eds.), Research on Technology and the Teaching and Learning of Mathematics: Vol. 2 Cases and Perspectives. Charlotte, NC: IAP. [PDF] Koedinger, K. R. & Aleven, V. (2007). Exploring the assistance dilemma in experiments with Cognitive Tutors. Educational Psychology Review, 19 (3): 239-264. [PDF] You can use forward and backward pointers to and from the references in these papers to find other potentially relevant papers - so can looking at related web sites like learnlab.org, carnegielearning.com, pact.cs.cmu.edu/koedinger/koedingerCV.html.

How can I verify the integrity of the files I’ve downloaded?

You can use the Unix program sha1sum and the following SHA-1 hashes to verify the challenge files:

d6907b97e675248c86683098f1075696b5d1a17d*algebra_2008_2009.zip

7134e3b44af538a55d53f15463b4db50abb191c2 *bridge_to_algebra_2008_2009.zip

Another way to verify the files is to count the number of lines in the training and test files, add them together, and compare this to the number of steps reported on the downloads page. You can use a Unix command like wc -l <filename> to count the number of lines in each file. Note that the number of steps we report does not include the header rows, so your number is likely to be greater by 2 (one header row in each file).

Why did you change the row numbering scheme for the challenge data sets?

This change was intentional. Our primary concern was that the numbering in the development sets left gaps between the student-units such that it was possible to infer how many more steps the student took to finish the unit by looking at the first row number of the next student-unit. This numbering scheme eliminates that possibility. More specifically, it is the gap between the test set of Student-Unit N and the training set for Student-Unit N+1 that is the give-away clue. An example is shown below. Rows Student Unit Data-set 1 - 100 Student1 Unit1 Training 101-120 Student1 Unit1 Test 150-250 Student1 Unit2 Training 251-265 Student1 Unit2 Test

THE OLD NUMBERING SCHEME (the gap between 120 and 150 is the give-away clue): THE NEW NUMBERING SCHEME: Rows Student Unit Data-set 1 - 100 Student1 Unit1 Training 1-20 Student1 Unit1 Test 101-201 Student1 Unit2 Training 21-35 Student1 Unit2 Test THE DERIVABLE TEMPORAL ORDERING (but with no between-unit gap information): Rows Student Unit Data-set 1 - 100 Student1 Unit1 Training 101-120 Student1 Unit1 Test 121-221 Student1 Unit2 Training 222-236 Student1 Unit2 Test

Known Noise in the Data In Algebra I 2006-2007, the knowledge components of some steps are “)]”. Are they correct?

No, this is incorrect. In addition, there are other KC names that are cut off in similar ways. We will fix this, but since this issue is only present in the development sets, it is not a priority. In Algebra I 2006-2007, some KCs have substring “(null~null)”. For example: These names are not a problem: The two arguments here are the “expected input” for the rule, and the actual user input. Checking the user input against the expected input is part of how the rule gets evaluated. In these cases, because they are “any action” then any input is “expected” so the expected input is null. In most of these cases the user input seems to be null also, except for the “left” indicating that the user specified the lefthand-side of the equation.

[Rule: action after done expr (null~~null)]~~[Rule: [SolverOperation subtract] in expr ([SolverOperation subtract]

You write that “a step row is uniquely identified by this hierarchy: Student > Problem Hierarchy > Problem Name > Problem View > Step. If that’s true, then why do I see duplicates for this key? It is true that there are duplicate (not unique) keys in the challenge data sets. (We haven’t analyzed the development data sets, but they probably exist there too.) We’d classify this issue as noise in the data; there should not be duplicate keys, but there are. The following breakdown shows the extent of these duplicate keys in the training files of the two challenge data sets: The most times a key is duplicated is 6 times.

Algebra I 2008-2009 training	Bridge to Algebra 2008-2009 training
8,918,055 keys	20,012,499 keys
8,915,724 unique keys	20,012,362 unique keys
2,331 duplicate keys	137 duplicate keys
0.0261% percentage duplicates of total	0.0007% percentage duplicates of total

In the test data, we found some rows with Problem View values greater than 1. Since Problem View is “the total number of times the student encountered the problem so far” and each test row is “determined by a program that randomly selects one problem for each student within a unit”, how could Problem View be larger than 1?

When we created the test files, we didn’t take “problem view” into account, so the result is that you may see more than one problem instance for a single unit-student pair in the test file. This was unintended, but you’ll need to account for it in your prediction algorithm. The presence of more than one problem for a student-unit pair in the test file will always take the form of all steps completed within another instance of that same problem. You can use the order of the rows in the test file and the problem view values to determine relatively when the student worked on the same problem again. Note that at the boundary between problem views - at the point where problem view increases - the steps surrounding the boundary are not necessarily contiguous: a student could have worked on the same problem much later in time.

In algebra_2008_2009_train.txt, I find that almost every line has the same start and end time. The step duration is also equal to 0. Is that normal?

Unfortunately, this is the case for all challenge data set training files. It is an issue in the raw data, and as such, we will not be able to fix it. It’s not known whether the timestamp that is duplicated across all four time columns is the step start time, step end time, or something else. As we will not be able to correct this issue in the data, you will need to take it into account.

We’ve seen duplicate knowledge components assigned to one step. For example, in algebra_2008_2009_train.txt, row 2569, the KC is defined as LABEL-X-HELP~~LABEL-X-HELP and the opportunity numbers for those KCs as 62~~1. Shouldn’t the opportunity count just increase over time?

This is a type of noise in the data. We don’t know why it occurs. You should ignore the second KC that is listed, as well as its opportunity number. You are right that for a student, the KC opportunity count should increase as time goes on and the student encounters steps with that KC.

Can we say, for the same student, the records along the row ID are listed along a time index, i.e., for the same student, a record with a larger row ID happens after a record with a smaller row ID?

No, that’s not a safe assumption for a couple of reasons. First, a clarification about the format: a student-step record is a summary of one or more student attempts (“transactions”) on a step, each with their own time stamp. This means that one can’t refer to a row as happening at one time. Secondly, attempts at different steps can be interleaved. Therefore, there is no guarantee that a student finishes one step before starting another one. We can clarify the order of student-step rows in either a training or test file. The order of these student-step rows is determined by the following rule: for a given instance of student working on a problem, order the rows by a field called “step time”, which is the time of the first correct transaction (“Correct Transaction Time”, where given) or, if there is no correct transaction, the time of the last transaction on the step (“Step End Time”). We don’t display this “step time” value but we use it to order rows. If the data is noisy, however, such as in Algebra 2008-2009 where the same time is given across all time fields for the step AND all steps in the problem, then the sorting within a problem is indeterminate: we can’t say what order it is in since we’ve tried to sort rows based on identical criteria. Similarly, ordering problems for a student could also be random if the same step time is used for more than one problem.

Frequently Asked Questions

KDD Cup 2010: Student performance evaluation

Post-Competition Questions

KDD Cup Archive

December 2025, Volume 27, Issue 2

2025 SIGKDD Rising Star Award

2025 SIGKDD Innovation Award: George Karypis