Dataset of multiple judges or examiners giving scores

Question

I am looking for a publicly available data set of (ideally) the following kind.

Multiple (must be 3 or more) examiners each grade the academic work (e.g., essay, dissertation, term-assignment) of multiple (ideally 20 or more) students.
Every examiner grades every student.
In addition to the grade given by each examiner to each student, there is a single "comprehensive" grade for each student given by faculty, based (in some manner that is transparent) on the other grades.

An example might be where all Honors-level dissertations are marked by the four members of a faculty examining committee. The description is of an ideal example, but I'd be very pleased to learn of other similar open-data examples where the grade involves some degree of "judgement" on the part of the examiner. I'm not looking, for example, for a dataset where multiple people ("examiners") record their reading from a measuring instrument (i.e., give a grade) under multiple conditions (equivalent to "multiple students")

It's a nice idea. I previously had a look at the Olympic diving scores which are somewhat similar and I might have to settle for some sort of Olympic data ... but I'd really like to find an example related to academic grades. There is a dataset I saw written up (I think in the British Medical Journal) relating to hundreds of different examiners grading thousands of prospective medical practitioners but I couldn't extract a large enough (sub)set meeting the kinds of conditions I described. — CrimsonDark, Oct 10 '14 at 11:20
@abc Unfortunately I did not but I'd be interested to know what use you were thinking of for such a data set. — CrimsonDark, Mar 19 '19 at 00:17

score 1 · Answer 1 · edited Jun 18 '20 at 08:24

1

It may be hard to find a dataset that matches your exact criteria, but there are some promising open datasets with essay scoring.

Kaggle Automated Essay Scoring, (link to data, requires registration)

The data will contain ASCII formatted text for each essay followed by one or more human scores, and (where necessary) a final resolved human score.

Where it is relevant, you are provided with more than one human score, so that you may evaluate the reliability of the human scorers

You can find code for benchmarks here.

International Corpus of Learner English, (link to data)

Data available on this page include annotated organization scores for 1,003 essays from the International Corpus of Learner English (ICLE).

edited Jun 18 '20 at 08:24

Community

1

answered Jan 13 '15 at 08:48

philshem

17,647
7
68
170

1

The two data sets are interesting and I appreciate knowing about them. – CrimsonDark Jan 16 '15 at 11:47
archive of the second dataset: https://web.archive.org/web/20190228093110/http://www.hlt.utdallas.edu/~persingq/ICLE/OrganizationScores.txt – philshem Jan 24 '20 at 07:41

Dataset of multiple judges or examiners giving scores

1 Answers1

Linked