Examining the Validity and Reliability of ChatGPT 3.5-Generated Reading Comprehension Questions for Academic Texts

  • Meida Rabia Sihite Universitas Alwashliyah
  • Meisuri Meisuri Postgraduate Program, Universitas Negeri Medan, Medan, Indonesia
  • Berlin Sibarani Postgraduate Program, Universitas Negeri Medan, Medan, Indonesia
Keywords: ChatGPT 3.5, validity, reliability, reading comprehension questions


This research examines the capacity of ChatGPT 3.5 in generating reading comprehension questions for academic texts, with a focus on their alignment with higher-order cognitive skills as per Bloom’s Taxonomy. A paper-based test comprising 30 multiple-choice questions was constructed using ChatGPT 3.5, based on three selected TOEFL ITP reading comprehension passages. The study employed a mixed-methods approach, integrating qualitative content analysis to assess the cognitive level of each question and quantitative methods to analyze student responses. Data collection involved administering the AI-generated questions to students and scoring their responses. Analysis techniques included Pearson correlation coefficients to determine validity and reliability analysis using Cronbach's Alpha to measure internal consistency. The findings revealed that ChatGPT 3.5 is capable of producing questions that cover a range of cognitive levels, from analysis to creation, however only 10 out of 30 questions met the validity criteria, indicating a need for improvement in the AI's question generation process. The reliability of these questions was moderate, suggesting a reasonable level of internal consistency. The study concludes that while AI-generated questions show promise in educational assessments, ongoing improvement of AI models is necessary to enhance their effectiveness. The implications of this research are significant for the future integration of AI in educational settings, indicating a potential role for AI in developing meaningful assessment tools. The study recommends future research to explore various question types and incorporate student feedback to optimize the effectiveness of AI in education.


Download data is not yet available.


Cukurova, M., Kent, C., & Luckin, R. (2019). Artificial intelligence and multimodal data in the service of human decision‐making: A case study in debate tutoring. British Journal of Educational Technology, 50(6), 3032–3046.

Eysenbach, G. (2023). The role of ChatGPT, generative language models, and artificial intelligence in medical education: a conversation with ChatGPT and a call for papers. JMIR Medical Education, 9(1), e46885.

Hung, Y.-C., Chaker, S. C., Sigel, M., Saad, M., & Slater, E. D. (2023). Comparison of Patient Education Materials Generated by Chat Generative Pre-Trained Transformer Versus Experts: An Innovative Way to Increase Readability of Patient Education Materials. Annals of Plastic Surgery, 91(4), 409–412.

Kim, Y.-S. G., Quinn, J. M., & Petscher, Y. (2021). What is text reading fluency and is it a predictor or an outcome of reading comprehension? A longitudinal investigation. Developmental Psychology, 57(5), 718.

Rahman, M. M., & Watanobe, Y. (2023). ChatGPT for education and research: Opportunities, threats, and strategies. Applied Sciences, 13(9), 5783.

Schiff, D. (2021). Out of the laboratory and into the classroom: the future of artificial intelligence in education. AI & Society, 36(1), 331–348.

Su, J., & Yang, W. (2023). Unlocking the power of ChatGPT: A framework for applying

Tyson, J. (2023). Shortcomings of ChatGPT. Journal of Chemical Education, 100(8), 3098–3101.

Vandiver, V. L. (2008). Integrating health promotion and mental health: An introduction to policies, principles, and practices. Oxford University Press.

How to Cite
Sihite, M. R., Meisuri, M., & Sibarani, B. (2023). Examining the Validity and Reliability of ChatGPT 3.5-Generated Reading Comprehension Questions for Academic Texts . Randwick International of Education and Linguistics Science Journal, 4(4), 937-944. https://doi.org/10.47175/rielsj.v4i4.835