Examining the Validity and Reliability of ChatGPT 3.5-Generated Reading Comprehension Questions for Academic Texts

Meida Rabia Sihite; Meisuri Meisuri; Berlin  Sibarani

doi:10.47175/rielsj.v4i4.835

Meida Rabia Sihite Universitas Alwashliyah
Meisuri Meisuri Postgraduate Program, Universitas Negeri Medan, Medan, Indonesia
Berlin Sibarani Postgraduate Program, Universitas Negeri Medan, Medan, Indonesia

DOI: https://doi.org/10.47175/rielsj.v4i4.835

Keywords: ChatGPT 3.5, validity, reliability, reading comprehension questions

Abstract

This research examines the capacity of ChatGPT 3.5 in generating reading comprehension questions for academic texts, with a focus on their alignment with higher-order cognitive skills as per Bloom’s Taxonomy. A paper-based test comprising 30 multiple-choice questions was constructed using ChatGPT 3.5, based on three selected TOEFL ITP reading comprehension passages. The study employed a mixed-methods approach, integrating qualitative content analysis to assess the cognitive level of each question and quantitative methods to analyze student responses. Data collection involved administering the AI-generated questions to students and scoring their responses. Analysis techniques included Pearson correlation coefficients to determine validity and reliability analysis using Cronbach's Alpha to measure internal consistency. The findings revealed that ChatGPT 3.5 is capable of producing questions that cover a range of cognitive levels, from analysis to creation, however only 10 out of 30 questions met the validity criteria, indicating a need for improvement in the AI's question generation process. The reliability of these questions was moderate, suggesting a reasonable level of internal consistency. The study concludes that while AI-generated questions show promise in educational assessments, ongoing improvement of AI models is necessary to enhance their effectiveness. The implications of this research are significant for the future integration of AI in educational settings, indicating a potential role for AI in developing meaningful assessment tools. The study recommends future research to explore various question types and incorporate student feedback to optimize the effectiveness of AI in education.

Downloads

Download data is not yet available.

References

Cukurova, M., Kent, C., & Luckin, R. (2019). Artificial intelligence and multimodal data in the service of human decision‐making: A case study in debate tutoring. British Journal of Educational Technology, 50(6), 3032–3046.

Eysenbach, G. (2023). The role of ChatGPT, generative language models, and artificial intelligence in medical education: a conversation with ChatGPT and a call for papers. JMIR Medical Education, 9(1), e46885.

Hung, Y.-C., Chaker, S. C., Sigel, M., Saad, M., & Slater, E. D. (2023). Comparison of Patient Education Materials Generated by Chat Generative Pre-Trained Transformer Versus Experts: An Innovative Way to Increase Readability of Patient Education Materials. Annals of Plastic Surgery, 91(4), 409–412.

Kim, Y.-S. G., Quinn, J. M., & Petscher, Y. (2021). What is text reading fluency and is it a predictor or an outcome of reading comprehension? A longitudinal investigation. Developmental Psychology, 57(5), 718.

Rahman, M. M., & Watanobe, Y. (2023). ChatGPT for education and research: Opportunities, threats, and strategies. Applied Sciences, 13(9), 5783.

Schiff, D. (2021). Out of the laboratory and into the classroom: the future of artificial intelligence in education. AI & Society, 36(1), 331–348.

Su, J., & Yang, W. (2023). Unlocking the power of ChatGPT: A framework for applying

Tyson, J. (2023). Shortcomings of ChatGPT. Journal of Chemical Education, 100(8), 3098–3101.

Vandiver, V. L. (2008). Integrating health promotion and mental health: An introduction to policies, principles, and practices. Oxford University Press.