Data Analytics and Psychometrics by Jiao Hong;Lissitz Robert W.;Van Wie Anna;

Data Analytics and Psychometrics by Jiao Hong;Lissitz Robert W.;Van Wie Anna;

Author:Jiao, Hong;Lissitz, Robert W.;Van Wie, Anna;
Language: eng
Format: epub
Publisher: Information Age Publishing
Published: 2018-11-20T17:58:07+00:00


Chapter 6

Measuring Rater Effectiveness

New Uses of Value-Added Modeling in Competency-Based Education

B. Brian Kuhlman

Western Governors University

Abstract

Raters are effective to the extent that they are unbiased yet helpful. This chapter demonstrates a novel way for institutions to monitor the levels of bias and helpfulness among raters without requiring multiple ratings of task submissions. The ability to work with single ratings is crucial in settings where the volume and velocity of task submissions is high. While bias and helpfulness require separate models and different outcome variables they can be measured using the same basic approach: a variation of value-added modeling where the “teacher effect” is translated into the “rater effect.” Understood within this framework a rater’s bias is the extent to which her pass rate tends to be higher or lower than expected (given other factors), and a rater’s helpfulness is the extent to which his students’ growth rates tend to be faster or slower than expected (given other factors). The result of each model is a measure of effectiveness that competency-based institutions can use to evaluate their raters. My goal in this chapter is to provide an overview of the methods and outputs of this approach at a conceptual rather than technical level. To this end, the chapter uses examples from fictional raters and students at Western Governors University (WGU). After the conceptual models are established, analyses of over 800 actual raters at WGU (whose actual Evaluation Department rates thousands of task submissions every day) across 2 months demonstrates: (a) the rater bias model, trained on 1 month of data, correctly predicts about three-fourths (AUC = 0.749) of the following month’s testing outcomes; (b) the rater helpfulness model, also trained on 1 month of data, explains about two-thirds of the variance (R-sqd = 0.64) in the following month’s score increases; and (c) bias and helpfulness measures are uncorrelated among the raters which suggests that they indicate distinct abilities.

Leaders in competency-based education have a duty to measure rater effectiveness, and to do so fairly using the best information. This is true anywhere high-stakes assessments are administered, but truer yet at institutions which heavily depend on constructed-response tasks to assess professional competencies. For example, at WGU (whose Evaluation Department manages nearly 900 raters and 2,500 task submissions every day) 68% of the assessments are constructed-response format. While these numbers will certainly shift across time as WGU grows, it is safe to assume that rating task submissions will remain a very big piece of its operations. That said, rater effectiveness is important to measure not only because of the operational scale but also because of its regular impact on the progress of each student.

At competency-based institutions like WGU, students submit their constructed-response tasks to a team of raters who (a) decide whether those students are required to revise and resubmit before progressing and (b) send written feedback to students regarding the decision. These ratings are effective to the extent that (a) the decisions are unbiased and (b) the feedback is helpful. Students deserve to be rated fairly and need our help to improve.



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.