Evaluating the quality of exam questions: a multidimensional item response

Zulkifley Mohamed

QR Code Link :
Type :	Article
Subject :	L Education
Main Author :	Zulkifley Mohamed
Additional Authors :	Faiz Zulkifli Rozaimah Zainal Abidin
Title :	Evaluating the quality of exam questions: a multidimensional item response
Hits :	812

Place of Production :	Tanjong Malim
Publisher :	Fakulti Sains dan Matematik
Year of Publication :	2019
Corporate Name :	Universiti Pendidikan Sultan Idris
PDF Full Text :	You have no permission to view this item.

Abstract : Universiti Pendidikan Sultan Idris

The purpose of this research is to propose a new approach for evaluating the quality of the exam questions. Exam results were obtained from students taking the statistics and probability course in Universiti Teknologi MARA (UiTM). The number of exam questions is set by 10 questions with 30 items that have varying degrees of difficulty. A total of 214 students' results have been extracted from the iCGPA system. “Multidimensional Item Response Analysis (MIRA)” was applied for the 1PL (Rasch), 2PL and 3PL models to evaluate the quality of the exam questions. The models were estimated using MH-RM algorithm in the R package. Model fitting comparison is based on the log-likelihood, SE, AIC and BIC statistics. The statistic and Zh statistic were calculated to identify the item misfit and person misfit. Through model fittings, all three models give the value of all acceptable and almost identical statistic. 5 items are considered as misfit by the 1PL model. For the 2PL and 3PL models, 5 items are categorized as misfit. The reduction in the number of misfit items can be attributed to the addition of information to the IRA model. On the other hand, the analysis of person fit provides different misfit percentages between the IRA models. This is probably because most students can answer all the questions very well. In conclusion, the quality of exam questions for statistics and probability courses needs to be improved by increasing the degree of difficulty of the questions that incorporate higher-order thinking skill.

References

1. N. Behizadeh and G. Jr. Engelhard, “Historical view of the influences of measurement and writing theories on the practice of writing assessment in the United States,” Assessing Writing, 16(3), 2011, pp. 189-211.

2. F. Zulkifli, R. Z. Abidin, N. F. M. Razi, N. H. Mohammad, R. Ahmad, and A. Z. Azmi, “Evaluating quality and reliability of final exam questions for probability and statistics course using

3. Rasch model,” International Journal of Engineering and Technology (UAE), 7(4), 2018, pp. 32-36.

4. J. Jerrim, M. B. Oliver, and S. Sims, “The relationship between inquiry-based teaching and students‟ achievement: New evidence from a longitudinal PISA study in England,” Learning and Instruction, 61, 2019, pp. 35-44.

5. N. Lohgheswary, Z. M. Nopiah, and E. Zakaria, “Evaluating the reliability of pre-test differential equations questions using Rasch measurement model,” Journal of Engineering Science and Technology, 11, 2016, pp. 31-39.

6. H. Othman, N. A. Ismail, I. Asshaari, F. M. Hamzah, and Z. M. Nopiah, “Application of Rasch measurement model for reliability measurement instrument in vector calculus course,” Journal of Engineering Science and Technology, 10, 2015, pp. 77-83.

7. M. Şahin and Y. Yildirim, "The examination of item difficulty distribution, test length and sample size in different ability distribution," Journal of Measurement and Evaluation in Education and Psychology, 9, 2018, pp. 277-294.

8. A. Birnbaum, “Some latent trait models and their use in inferring an examinee‟s ability,” Statistical Theories of Mental Test Scores, 1968, pp. 397-479.

9. F. Samejima, “Estimation of ability using a pattern of graded responses,” Psychometrika Monograph, 17, 1969, pp. 1–100.

10. R. P. Chalmers, “mirt: A multidimensional item response theory package for the R environment,” Journal of Statistical Software, 48(6), 2012, pp. 1-29.

11. L. Cai, “High-dimensional exploratory item factor analysis by a Metropolis-Hastings Robbins-Monro algorithm," Psychometrika, 75(1), 2010, pp. 33-57.

12. L. Cai, “Metropolis-Hastings Robbins-Monro algorithm for confirmatory item factor analysis," Journal of Educational and Behavioral Statistics, 35(3), 2010, pp. 307-335.

13. M. Orlando, and D. Thissen, “Likelihood-based item-fit indices for dichotomous item response theory models,” Applied Psychological Measurement, 24, 2000, pp. 50-64.

14. J. M. Felt, R. Castaneda, J. Tiemensma, and S. Depaoli, “Using person fit statistics to detect outliers in survey research,” Frontiers in Psychology, 8, 2017, pp. 863

15. B. D. Wright, “Misunderstanding the Rasch model,” Journal of Educational Measurement, 14(3), 1977, pp. 219-225.

16. K. K. Tatsuoka, and M. M. Tatsuoka, “Detection of aberrant response patterns and their effect on dimensionality,” Journal of Educational Statistics, 7(3), 1982, pp. 215-231.

17. F. Drasgow, M. V. Levine, and E. A. William, “Appropriateness measurement with polychotomous item response models and standardized indices,” Br. J. Math. Stat. Psychol., 38, 1985, pp. 67–86.

This material may be protected under Copyright Act which governs the making of photocopies or reproductions of copyrighted materials.
You may use the digitized material for private study, scholarship, or research.

Back to search page