The KR(20), or Kuder-Richardson Formula, measures overall test reliability.

Essentially it lets you know whether the exam as a whole discriminated among students who mastered the subject matter and those who did not.

The KR(20) generally ranges between 0.0 and +1.0, but it can fall below 0.0 with smaller sample sizes.

The closer the KR(20) is to +1.0 the more reliable an exam is considered because its questions do a good job consistently discriminating among higher and lower performing students.

A KR(20) of 0.0 means the exam questions didn't discriminate at all. Imagine a test where all 20 students answered all 40 questions correctly. The test didn't discriminate among any of them, and its KR(20) of 0.0 makes perfect sense.

EAC suggestion: The interpretation of the KR(20) depends on the purpose of the test. Most high stakes exams are intended to distinguish those students who have mastered the material from those who have not. For these, shoot for a KR(20) of +0.50 or higher. A KR(20) of less than +0.30 is considered poor no matter the sample size. If the purpose of the exam is to ensure that ALL students have mastered essential skills or concepts or the test is a "confidence builder" with intentionally easy questions, look for a KR(20) close to 0.00.

See other test statistics: