Assessment of multiple-choice questions in medicine. Validity evidence of an instrument

Jesús Rivera Jiménez; Fernando Flores Hernández; Amilcar Alpuche Hernández; Adrián Martínez González

2017, Number 21

<< Back Next >>

Inv Ed Med 2017; 6 (21)

Assessment of multiple-choice questions in medicine. Validity evidence of an instrument

Rivera JJ, Flores HF, Alpuche HA, Martínez GA

Full text

How to cite this article

Language: Spanish
References: 25
Page: 8-15
PDF size: 193.94 Kb.

ABSTRACT

Introduction: The appropriate preparation of test ítems of an examination constitutes validity evidence in itself. Despite there being a general consensus about item-writing guidelines, several studies report a high incidence of violations of these standards. An instrument is proposed in order to assess the quality of multiple-choice item-writing, describing the validity evidence gathering process.
Methods: The validity evidence was gathered on an instrument designed to assess multiple choice ítems features, according to the sources proposed by the Standards for Educational and Psychological Testing, and particularly those related to content, response process, and internal structure. Kappa index (following Fleiss’ model) and point-biserial correlation coefficient were used to measure concordance in the criteria assessed by the instrument. An exploratory factorial analysis was performed to identify the instrument dimensions, and Cronbach’s alpha was calculated as an internal consistency statistic.
Results: Concordance between multiple judges was greater than 0.8 (almost perfect agreement) for 12 out of 21 criteria, and 0.19 for Bloom’s taxonomy level. Factorial analysis defined 4 dimensions with Kaiser-Meyer-Olkin (KMO) test =0.666 (p‹.01), explained variance of 49.979%, and a Cronbach’s alpha of 0.627.
Conclusion: This instrument can be used to assess multiple choice ítems, since it counts with validity evidence related to content, response process and internal structure, and psychometric values appropriated for instrumentation.

REFERENCES

Krathwohl DR. A revision of Bloom’s taxonomy: An Overview.Theory Pract. 2002;41:212---8.
Miller GE. The assessment of clinical skills-competenceperformance.Acad Med. 1990;65:S63---7.
Wass V, van der Vleuten C, Shatzer J, Jones R. Assessment ofclinical competence. Lancet. 2001;357:945---9.
Haladyna TM, Downing SM, Rodriguez MC. A review ofmultiple-choice item-writing guidelines for classroom assessment.2002;15:309---34.
American Educational Research, Association, American PsychologicalAssociation, National Council on Measurement inEducation. The standards for educational and psychologicaltesting. Washington, D.C.:American Educational Research Association,2014.
Downing SM. Validity: on meaningful interpretation of assessmentdata. Med Educ. 2003 Sep;37:830---7.
Tarrant M, Knierim A, Hayes SK, Ware J. The frequency of ítemwriting flaws in multiple-choice questions used in high stakesnursing assessments. Nurse Educ Pract. 2006 Dec;6:354---63.
Jozefowicz RF, Koeppen BM, Case S, Galbraith R, Swanson D,Glew RH. The quality of in-house medical school examinations.Acad Med. 2002;77:156---61.
Masters JC, Hulsmeyer BS, Pike ME, Leichty K, Miller MT, VerstAL. Assessment of multiple-choice questions in selected testbanks accompanying text books used in nursing education. JNurs Educ. 2001 Jan;40:25---32.
Pate A, Caldwell DJ. Effects of multiple-choice item-writing guidelineutilization on ítem and student performance. Curr PharmTeach Learn. 2014 Jan;6:130---4.
Jurado-Nu˜nez AG, Flores-Fernandez F, Delgado-Maldonado L,Sommer-Cervantes H, Martínez-González A, Sánchez-MendiolaM. Distractores en preguntas de opción múltiple para estudiantesde Medicina ¿Cuál es su comportamiento en un examen dealtas consecuencias? Inv Ed Med. 2013;2:202---10.
Downing SM. The effects of violating standard ítem writing principleson test and students: The consequences of using flawedtest ítems on achievement examinations in medical education.Adv Heal Sci Educ. 2005;10:133---43.
Naeem N, van der Vleuten C, Alfaris EA. Faculty developmenton item writing substantially improves item quality. Adv HealthSci Educ Theory Pract. 2012 Aug;17:369---76.
Tarrant M, Ware J. A framework for improving the quality ofmultiple-choice assessments. Nurse Educ. 2012;37:98---104.
Moreno R, Martínez RJ. Directrices para la construcción de ítemsde elección múltiple. Psicothema. 2004;16:490---7.
Downing SM, Haladyna TM. Manual para el desarrollo de pruebasa gran escala. México, D.F: Centro Nacional de Evaluación parala Educación Superior; 2012.
Buckwalter JA, Schumacher R, Albright JP, Cooper RR. Use of aneducational taxonomy for evaluation of cognitive performance.J Med Educ. 1981;56:115---21.
Case SM, Swanson DB. Cómo construir preguntas de selecciónmúltiple para ciencias básicas y ciencias clínicas. Philadelphia:National Board of Medical Examiners; 2014.
Dirección General de Evaluación Educativa UNAM. Lineamientosgenerales para la elaboración de reactivos [Internet].[citado 4 Abr 2015]. Disponible en: http://www.inb.unam.mx/ensenanza/lineamto gral elabora reactivo.pdf.
Fleiss JL. Measuring nominal scale agreement among manyraters. Psychol Bull. 1971;76:378---82.
Landis JR, Koch GG. The measurement of observer agreementfor categorical data. Biometrics. 1977;33:159---74.
Cunnington JPW, Norman GR, Blake JM, Dauphinee WD, BlackmoreDE. Applying learning taxonomies to test items: is a factan artifact? Acad Med. 1996;71:31---3.
Kibble JD, Johnson T. Are faculty predictions or ítem taxonomiesuseful for estimating the outcome of multiple-choice examinations?AJP: Adv Physiol Educ. 2011;35:396---401.
Thompson E, Luxton-Reilly A, Whalley JL, Hu M, Robbins P.Bloom’s taxonomy for CS assessment. Conf Res Pract Inf TechnolSer. 2008;78:155---61.
Moreno R, Martínez RJ, Mu˜niz J. New guidelines for developingmultiple-choice items. Methodology. 2006;2:65---72.