Sociocognitive Foundations of Educational Measurement by Robert J. Mislevy

Sociocognitive Foundations of Educational Measurement by Robert J. Mislevy

Author:Robert J. Mislevy
Language: eng
Format: epub, pdf
Publisher: Taylor & Francis Ltd


8.4.4 Equating

No discussion of comparability is complete without mentioning test equating. Equivalent test forms developed in the manner described earlier are not identical, and they sometimes differ as to their difficulty or internal consistency reliability, for example. Equating procedures attempt to use data to map scores from one form to another or from all forms to a common scale in order to make their scores even more comparable (Holland & Dorans, 2006; Kolen & Brennan, 2013). For example, when random samples from the same examinee population are administered two data-comparable forms, the linear function that matches up the means and standard deviations of the two score distributions is a linear equating function. There are many equating functions and many equating data-gathering designs. In testing programs with well-constructed test forms, equating functions don’t differ much from identity functions, and equatings based on different subpopulations, such as boys and girls or examinees with different first languages, don’t usually produce differences that would lead to different inferences in comparing examinees when reliability is taken into account (Dorans & Feigenbaum, 1994).

That equating is successfully employed as a step in providing comparable data in familiar testing programs has led to the misconception that the comparability is produced by the statistical equating procedures. Should we not be able to apply these procedures to any two tests that are purported to measure the same construct—the ACT and the SAT, for example, or a state-level fourth-grade mathematics achievement test and the fourth-grade National Assessment of Educational Progress? The answer is generally no (Feuer, Holland, Green, Bertenthal, & Hemphill, 1999). The foregoing discussion explained how the comparability of the data that comes out of the equating of, say, two forms of the ASVAB comes mainly from the design and construction of the forms to begin with. Linn (1993) and Mislevy (1992) discuss the kinds of inferences that might be made with data from assessments not constructed to be comparable. Their analyses into issues of design, population, and intended inferences show that the question is no longer one of comparable data, but one of comparable evidence. We explore this issue more in the upcoming chapters on fairness.



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.