Some Thoughts on Analytical Choices in the Scaling Model for Test Scores in International Large-Scale Assessment Studies

Alexander Robitzsch; Oliver Lüdtke

doi:10.1186/s42409-022-00039-w

Some Thoughts on Analytical Choices in the Scaling Model for Test Scores in International Large-Scale Assessment Studies

Alexander Robitzsch
IPN − Leibniz Institute for Science and Mathematics Education, Olshausenstraße 62, 24118, Kiel; Centre for International Student Assessment (ZIB), Kiel
Oliver Lüdtke
IPN − Leibniz Institute for Science and Mathematics Education, Olshausenstraße 62, 24118, Kiel; Centre for International Student Assessment (ZIB), Kiel

Abstract

International large-scale assessments (LSAs), such as the Programme for International Student Assessment (PISA), provide essential information about the distribution of student proficiencies across a wide range of countries. The repeated assessments of the distributions of these cognitive domains offer policymakers important information for evaluating educational reforms and received considerable attention from the media. Furthermore, the analytical strategies employed in LSAs often define methodological standards for applied researchers in the field. Hence, it is vital to critically reflect on the conceptual foundations of analytical choices in LSA studies. This article discusses the methodological challenges in selecting and specifying the scaling model used to obtain proficiency estimates from the individual student responses in LSA studies. We distinguish design-based inference from model-based inference. It is argued that for the official reporting of LSA results, design-based inference should be preferred because it allows for a clear definition of the target of inference (e.g., country mean achievement) and is less sensitive to specific modeling assumptions. More specifically, we discuss five analytical choices in the specification of the scaling model: (1) specification of the functional form of item response functions, (2) the treatment of local dependencies and multidimensionality, (3) the consideration of test-taking behavior for estimating student ability, and the role of country differential items functioning (DIF) for (4) cross-country comparisons and (5) trend estimation. This article’s primary goal is to stimulate discussion about recently implemented changes and suggested refinements of the scaling models in LSA studies.

PDF

Published at

3. September 2022
https://doi.org/10.1186/s42409-022-00039-w
Issue:

Vol. 4 (2022)
Section:

Advances in Methodology
Categories:

The Use of Test Scores in Secondary Analyses
Keywords:

Large-scale assessment Item response models Scaling Linking Differential item functioning Partial invariance Item response function Trend estimation PISA Survey statistics Educational assessment
Share:

This work is licensed under a Creative Commons Attribution (CC BY) 4.0 International License.

PlumX

Dimensions

Views:

Total	Abstract	PDF
391	282	109

Authors

Abstract