Traditional Law School instruction is notorious for its lack of formative assessment (Socratic method aside), relying almost exclusively on summative assessment. This session will document the development and testing of a method for algorithmically-assisted pre-grading which seeks to promote the use of formative assessment by making the process of grading free response questions more efficient and therefore more practical for individual instructors.
Building upon work in the area of natural language processing, this session will present a novel method for unsupervised machine scoring of short answer and essay question responses that relys solely on a sufficiently large set of responses to a common prompt, absent the need for pre-labeled sample answers—given said prompt is of a particular character. That is, for questions where “good” answers look similar, “wrong” answers are likely to be “wrong” in different ways. Consequently, when a collection of text embeddings for responses to a common prompt are placed in an appropriate feature space, the centroid of their placements can stand in for a model answer, providing a loadstar against which to measure individual responses.
To evaluate the efficacy of the above method, more than eight hundred student answers to a set of thirteen free response questions, drawn from six Suffolk University Law School final exams, taught by five instructors, were run through the algorithm, resulting in ordered lists of answers which were found to more closely resemble the human graders' rankings than randomly ordered lists. Additionally, the difference between the randomly ordered and machine-ordered lists was found to be statistically significant.