Reliability and validity of PIRLS and TIMSS: does the response format matter?

Schult, Johannes; Sparfeldt, Jörn R.

Please use this identifier to cite or link to this item: doi:10.22028/D291-26944

Title:	Reliability and validity of PIRLS and TIMSS: does the response format matter?
Author(s):	Schult, Johannes Sparfeldt, Jörn R.
Language:	English
Title:	European Journal of Psychological Assessment
Publisher/Platform:	Hogrefe
Year of Publication:	2016
Free key words:	Response format Multiple-choice Constructed-response Item response theory Validity
DDC notations:	150 Psychology 370 Education
Publikation type:	Journal Article
Abstract:	Academic achievements are often assessed in written exams and tests using selection-type (e.g., multiple-choice; MC) and supply-type (e.g., constructed-response; CR) item response formats. The present article examines how MC items and CR items differ with regard to reliability and criterion validity in two educational large-scale assessments with fourth-graders. The reading items of PIRLS 2006 were compiled into MC scales, CR scales, and mixed scales. Scale reliabilities were estimated according to item response theory (international PIRLS sample; n = 119,413). MC showed smaller standard errors than CR around the reading proficiency mean, whereas CR was more reliable for low and high proficiency levels. In the German sample (n = 7,581), there was no format-specific differential validity (criterion: German grades, r ˜ .5; ?r = 0.01). The mathematics items of TIMSS 2007 (n = 160,922) showed similar reliability patterns. MC validity was slightly larger than CR validity (criterion: mathematics grades; n = 5,111; r ˜ .5, ?r = –0.02). Effects of format-specific test-extensions were very small in both studies. It seems that in PIRLS and TIMSS, reliability and validity do not depend substantially on response formats. Consequently, other response format characteristics (like the cost of development, administration, and scoring) should be considered when choosing between MC and CR.
DOI of the first publication:	10.1027/1015-5759/a000338
Link to this record:	urn:nbn:de:bsz:291-scidok-ds-269443 hdl:20.500.11880/26920 http://dx.doi.org/10.22028/D291-26944
Date of registration:	22-Dec-2017
Third-party funds sponsorship:	This research was prepared with the support of the German funds “Bund-Länder-Programm für bessere Studienbedingungen und mehr Qualität in der Lehre (‘Qualitätspakt Lehre’)” [the joint program of the Federal and States Government for better study conditions and the quality of teaching in higher education (“the Teaching Quality Pact”)] at Saarland University (funding code: 01PL11012). The authors developed the topic and the content of this manuscript independently from this funding. We thank the Institute for School Development Research (IFS) at Technical University Dortmund / the Max Planck Institute for Human Development (MPIB) Berlin / the Standing Conference of the Ministers of Education and Cultural Affairs (KMK) as well as the Research Data Centre (FDZ) at the Institute for Educational Quality Improvement (IQB) for providing the raw data.
Sponsorship ID:	01PL11012
Faculty:	HW - Fakultät für Empirische Humanwissenschaften und Wirtschaftswissenschaft
Department:	HW - Bildungswissenschaften
Collections:	SciDok - Der Wissenschaftsserver der Universität des Saarlandes

Files for this record:

File	Description	Size	Format
Schult-Sparfeldt-PIRLS-TIMSS-2016.pdf	Schult & Sparfeldt (2016) Reliability and validity of PIRLS and TIMSS (Post-Print Manuskript)	494,38 kB	Adobe PDF	View/Open

Export: BibTex