Examining the Performance of Artificial Intelligence in Scoring Students' Handwritten Responses to Open-Ended Items

Mahmut Yiğiter; Erdem Boduroğlu

doi:10.15390/EB.2025.14119

Abstract

Open-ended items, which have been used as a measurement method for centuries in the evaluation of student achievement, have many advantages, such as measuring high-level skills, providing rich diagnostic information about the student, and not having chance success. However, today, open-ended items cannot be used in exams with a large number of students due to the potential for errors in the scoring process and disadvantages in terms of labour, time, and cost. At this point, Artificial Intelligence (AI) has an important potential in scoring open-ended items. The aim of this study is to examine the scoring performance of AI in scoring students' handwritten responses to open-ended items. In the study, an achievement test consisting of 3 open-ended and 10 multiple-choice items was developed within the scope of the Measurement and Assessment in Education course at a state university. Open-ended items were scored in a structured way (0-1-2), while multiple-choice items were scored as true-false (0-1). 84 participants took part in the study, and the open-ended items were scored by the expert group and the AI tool (ChatGPT-4o). The visual responses written by the students in their handwriting were scored by the AI tool in two different scenarios. In the first scenario, the AI tool was asked to score without giving any scoring criteria to the AI, whereas in the second scenario, the AI was asked to score according to the standard scoring criteria. The findings of the study showed that there were low agreement and correlation coefficients between the AI scores without criteria and expert scores, while there were high agreement and correlation coefficients between the AI scores with standard scoring criteria and expert scores. Similar to these findings, while the item discriminations of the AI scoring without criteria were quite low, the item discriminations of the AI scores with standard scoring criteria were high. In the study, the reasons for the discrepancies between expert scores and AI scores with standard criteria were also investigated and reported. The results show that AI can score handwritten open-ended items with standardized scoring criteria at a good level. In the future, with the development and transformation of AI, it is thought that it can reach scoring accuracy comparable to expert raters in terms of consistency.

Keywords: Open-ended item, Artificial intelligence, AI, ChatGPT, Automated scoring, Handwritten responses, Constructed response item

References

Abdolreza Gharehbagh, Z., Mansourzadeh, A., Montazeri Khadem, A., & Saeidi, M. (2022). Reflections on using open-ended questions. Medical Education Bulletin, 3(2), 475-482.
Agustianingsih, R., & Mahmudi, A. (2019). How to design open-ended questions?: Literature review. Journal of Physics: Conference Series, 1320(1). doi:10.1088/1742-6596/1320/1/012003
Alers, H., Malinowska, A., Meghoe, G., & Apfel, E. (2024). Using ChatGPT-4 to grade open question exams. In K. Arai (Ed.), Advances in information and communication (pp. 1-9). Switzerland: Springer Nature. doi:10.1007/978-3-031-53960-2_1.
Almusharraf, N., & Alotaibi, H. (2023). An error-analysis study from an EFL writing context: Human and automated essay scoring approaches. Technology Knowledge and Learning, 28(3), 1015-1031.
Aydın, B., Algina, J., Leite, W. L., & Atılgan, H. (2018). Sosyal bilimler için r’a giriş. Ankara: Anı Yayıncılık.
Aznar-Mas, L. E., Atarés Huerta, L., & Marin-Garcia, J. A. (2023). Effectiveness of the use of open-ended questions in student evaluation of teaching in an engineering degree. Journal of Industrial Engineering and Management, 16(3), 521. doi:10.3926/jiem.5620
Baburajan, V., de Abreu e Silva, J., & Pereira, F. C. (2022). Open vs closed-ended questions in attitudinal surveys-comparing, combining, and interpreting using natural language processing. Transportation Research. Part C, Emerging Technologies, 137(12), 103589. doi:10.1016/j.trc.2022.103589
Badger, E., & Thomas, B. (2019). Open-ended questions in reading. Practical Assessment, Research, and Evaluation, 3(1), 4.
Baykul, Y. & Turgut, M. F. (2012). Eğitimde ölçme ve değerlendirme. Ankara: Pegem Akademi Yayıncılık.
Beiting-Parrish, M., & Whitmer, J. (2023). Lessons learned about evaluating fairness from a data challenge to automatically score NAEP reading items. Chinese/English Journal of Educational Measurement and Evaluation, 4(3). doi:10.59863/nkcj9608
Beksultanova, A. I., Vatyukova, O. Y., & Yalmaeva, M. A. (2020). Application of digital technologies in the educational process. Proceedings of the 2nd International Scientific and Practical Conference on Digital Economy (ISCDE 2020).
Brookhart, S. M. (2010). How to assess higher-order thinking skills in your classroom. ASCD.
Bui, N. M., & Barrot, J. S. (2024). ChatGPT as an automated essay scoring tool in the writing classrooms: how it compares with human scoring. Education and Information Technologies. doi:10.1007/s10639-024-12891-w
Byrne, B. M. (2010). Structural equation modeling with AMOS: Basic concepts, applications, and programming. New York: Routledge.
Chen, L., Chen, P., & Lin, Z. (2020). Artificial intelligence in education: A review. Ieee Access, 8, 75264-75278.
Cohen, J. (1992). Statistical power analysis. Current Directions in Psychological Science, 1(3), 98-101. doi:10.1111/1467-8721.ep10768783
Demir, S. (2023). Investigation of ChatGPT and real raters in scoring open-ended items in terms of inter-rater reliability. Uluslararası Türk Eğitim Bilimleri Dergisi, 2023(21), 1072-1099. doi:10.46778/goputeb.1345752
Doğan, N. (2019). Eğitimde ölçme ve değerlendirme. Ankara: Pegem Akademi Yayınevi.
Fernandez, N., Ghosh, A., Liu, N., Wang, Z., Choffin, B., Baraniuk, R., & Lan, A. (2022). Automated scoring for reading comprehension via in-context bert tuning. In M. M. Rodrigo, N. Matsuda, A. I. Cristea, & V. Dimitrova (Eds.), International Conference on Artificial Intelligence in Education (pp. 691-697). Cham: Springer International Publishing.
Fitriyah, Y., Wahyudin, Suhendra, Nurhayati, H., & Febrianti, T. S. (2024). Open-ended approach for critical thinking skills in mathematics education: A meta-analysis. EduMatSains: Jurnal Pendidikan, Matematika Dan Sains, 9(1), 156-174. doi:10.33541/edumatsains.v9i1.5975
Freedman, R. L. H. (1994). Open-ended questioning: A handbook for educators. Boston: Addison-Wesley.
Gao, R., Merzdorf, H. E., Anwar, S., Hipwell, M. C., & Srinivasa, A. R. (2024). Automatic assessment of text-based responses in post-secondary education: A systematic review. Computers and Education: Artificial Intelligence, 6, 100206. doi:10.1016/j.caeai.2024.100206
Geer, J. G. (1988). What do open-ended questions measure?. Public Opinion Quarterly, 52(3), 365-367.
Güler, N. (2014). Analysis of open-ended statistics questions with many facet Rasch model. Eurasian Journal of Educational Research, 55, 73-90. doi:10.14689/ejer.2014.55.5
Hair, J., Black, W. C., Babin, B. J., & Anderson, R. E. (2010). Multivariate data analysis (7th ed.). Upper Saddle River, NJ: Pearson Educational International.
Hogan, T. P., & Murphy, G. (2007). Recommendations for preparing and scoring constructed-response items: What the experts say. Applied Measurement in Education, 20(4), 427-441. doi:10.1080/08957340701580736
Jamil, F., & Hameed, I. A. (2023). Toward intelligent open-ended questions evaluation based on predictive optimization. Expert Systems with Applications, 231, 120640. doi:10.1016/j.eswa.2023.120640
Jescovitch, L. N., Scott, E. E., Cerchiara, J. A., Merrill, J., Urban-Lurain, M., Doherty, J. H., & Haudek, K. C. (2021). Comparison of machine learning performance using analytic and holistic coding approaches across constructed response assessments aligned to a science learning progression. Journal of Science Education and Technology, 30(2), 150-167. doi:10.1007/s10956-020-09858-0
Jukiewicz, M. (2024). The future of grading programming assignments in education: The role of ChatGPT in automating the assessment and feedback process. Thinking Skills and Creativity, 52, 101522. doi:10.1016/j.tsc.2024.101522
Karadag, N., Boz Yuksekdag, B., Akyildiz, M., & Ibileme, A. I. (2020). Assessment and evaluation in open education system: Students’ opinions about Open-Ended Question (OEQ) practice. Turkish Online Journal of Distance Education, 22(1), 179-193. doi:10.17718/tojde.849903
Karakaya, İ. (2022). Açık uçlu soruların hazırlanması, uygulanması ve değerlendirilmesi. Ankara: Pegem Yayınları.
Karasar, N. (2012). Bilimsel araştırma yöntemi (24. bs.). Ankara: Nobel Yayın Dağıtım.
Karimi, L. (2014). The effect of constructed-responses and multiple-choice tests on students’ course content mastery. Southern African Linguistics and Applied Language Studies, 32(3), 365-372. doi:10.2989/16073614.2014.997067
Kartikasari, S. A., Usodo B., & Riyadi (2022). The effectiveness open-ended learning and creative problem solving models to teach creative thinking skills. Pegem Journal of Education and Instruction, 12(4), 29-38. doi:10.47750/pegegog.12.04.04
Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33(1), 159-174.
Lin, Y., Zheng, L., Chen, F., Sun, S., Lin, Z., & Chen, P. (2020). Design and implementation of intelligent scoring system for handwritten short answer based on deep learning. IEEE International Conference on Artificial Intelligence and Information Systems (ICAIIS), Dalian, China. doi:10.1109/ICAIIS49377.2020.9194943
Lohman, D. F. (1993). Learning and the nature of educational measurement. NASSP Bulletin, 77(555), 41-53. doi:10.1177/019263659307755506
Lu, M., Zhou, W., & Ji, R. (2021). Automatic scoring system for handwritten examination papers based on YOLO algorithm. Journal of Physics: Conference Series, 2026. doi:10.1088/1742-6596/2026/1/012030
Maris, G., & Bechger, T. (2006). Scoring open ended questions. In Handbook of statistics (pp. 663-681). Hollanda: Elsevier.
Mizumoto, A., & Eguchi, M. (2023). Exploring the potential of using an AI language model for automated essay scoring. Research Methods in Applied Linguistics, 2(2), 100050. doi:10.1016/j.rmal.2023.100050
Monrat, N., Phaksunchai, M., & Chonchaiya, R. (2022). Developing students’ mathematical critical thinking skills using open-ended questions and activities based on student learning preferences. Education Research International, 2022, 1-11. doi:10.1155/2022/3300363
Owan, V. J., Abang, K. B., Idika, D. O., Etta, E. O., & Bassey, B. A. (2023). Exploring the potential of artificial intelligence tools in educational measurement and assessment. Eurasia Journal of Mathematics, Science and Technology Education, 19(8), em2307. doi:10.29333/ejmste/13428
Parker, J. L., Becker, K., & Carroca, C. (2023). ChatGPT for automated writing evaluation in scholarly writing instruction. Journal of Nursing Education, 62(12), 721-727.
Patton, M. Q. (2002). Qualitative research and evaluation methods. Thousand Oaks: CA: Sage Publications.
Pinto, G., Cardoso-Pereira, I., Ribeiro, D. M., Lucena, D., de Souza, A., & Gama, K. (2023). Large language models for rducation: Grading open-ended questions using ChatGPT. arXiv. doi:10.48550/ARXIV.2307.16696
Poole, F. J., & Coss, M. D. (2024). Can ChatGPT reliably and accurately apply a rubric to L2 writing assessments? The devil is in the prompt(s). Journal of Technology & Chinese Language Teaching, 15(1).
Ramineni, C., & Williamson, D. (2018). Understanding mean score differences between the e‐rater® automated scoring engine and humans for demographically based groups in the GRE® General Test. ETS Research Report Series, 2018(1), 1-31. doi:10.1002/ets2.12192
Quah, B., Zheng, L., Sng, T. J. H., Yong, C. W., & Islam, I. (2024). Reliability of ChatGPT in automated essay scoring for dental undergraduate examinations. BMC Medical Education, 24(1). doi:10.1186/s12909-024-05881-6
Sarwanto, F., Widi, L. E., & Chumdari. (2021). Open-Ended Questions to Assess CriticalThinking Skills in Indonesian Elementary School. International Journal of Instruction, 14(1), 615-630. doi:10.29333/iji.2021.14137a
Senkivska, L. (2022). The role of digital technologies in education. Journal of Education, Health and Sport, 12(1), 419-423. doi:10.12775/jehs.2022.12.01.036
Septiani, S., Retnawati, H., & Arliani, E. (2022). Designing closed-ended questions into open-ended questions to support student’s creative thinking skills and mathematical communication skills. JTAM (Jurnal Teori Dan Aplikasi Matematika), 6(3), 616. doi:10.31764/jtam.v6i3.8517
Suherman, S., & Vidákovich, T. (2022). Assessment of mathematical creative thinking: A systematic review. Thinking Skills and Creativity, 44, 101019. doi:10.1016/j.tsc.2022.101019
Sychev, O., Anikin, A., & Prokudin, A. (2020). Automatic grading and hinting in open-ended text questions. Cognitive Systems Research, 59, 264-272. doi:10.1016/j.cogsys.2019.09.025
Uysal, İ., & Doğan, N. (2021). How reliable is it to automatically score open-ended items? An application in the Turkish language. Egitimde ve Psikolojide Olçme ve Degerlendirme Dergisi, 12(1), 28-53. doi:10.21031/epod.817396
von Davier, M., Tyack, L., & Khorramdel, L. (2022). Automated scoring of graphical open-ended responses using artificial neural networks. arXiv. doi:10.48550/arXiv.2201.01783
Winarso, W., & Hardyanti, P. (2019). Using the learning of reciprocal teaching based on open ended to improve mathematical critical thinking ability. EduMa: Mathematics Education Learning and Teaching, 8(1). doi:10.24235/eduma.v8i1.4632
Xiao, C., Ma, W., Song, Q., Xu, S. X., Zhang, K., Wang, Y., & Fu, Q. (2024). Human-AI collaborative essay scoring: A dual-process framework with LLMs. arXiv. doi:10.48550/arXiv.2401.06431
Yaneva, V., Baldwin, P., Jurich, D. P., Swygert, K., & Clauser, B. E. (2023). Examining ChatGPT performance on USMLE sample items and implications for assessment. Academic Medicine, 99(2), 192-197.
Zesch, T., Horbach, A., & Zehner, F. (2023). To score or not to score: Factors influencing performance and feasibility of automatic content scoring of text responses. Educational Measurement Issues and Practice, 42(1), 44-58. doi:10.1111/emip.12544
Zhang, D., & Yuan, X. (2022). Intelligent scoring of English composition by machine learning from the perspective of natural language processing. Mathematical Problems in Engineering, 2022, 1-9. doi:10.1155/2022/9070272

Copyright and license

Copyright © 2025 The Author(s). This is an open access article distributed under the Creative Commons Attribution License (CC BY), which permits unrestricted use, distribution, and reproduction in any medium or format, provided the original work is properly cited.

How to cite

Yiğiter, M., & Boduroğlu, E. (2025). Examining the Performance of Artificial Intelligence in Scoring Students’ Handwritten Responses to Open-Ended Items. Education and Science, 50, 1-18. https://doi.org/10.15390/EB.2025.14119

Download citation

[ref1] Abdolreza Gharehbagh, Z., Mansourzadeh, A., Montazeri Khadem, A., & Saeidi, M. (2022). Reflections on using open-ended questions. Medical Education Bulletin, 3(2), 475-482.

[ref2] Agustianingsih, R., & Mahmudi, A. (2019). How to design open-ended questions?: Literature review. Journal of Physics: Conference Series, 1320(1). doi:10.1088/1742-6596/1320/1/012003

[ref3] Alers, H., Malinowska, A., Meghoe, G., & Apfel, E. (2024). Using ChatGPT-4 to grade open question exams. In K. Arai (Ed.), Advances in information and communication (pp. 1-9). Switzerland: Springer Nature. doi:10.1007/978-3-031-53960-2_1.

[ref4] Almusharraf, N., & Alotaibi, H. (2023). An error-analysis study from an EFL writing context: Human and automated essay scoring approaches. Technology Knowledge and Learning, 28(3), 1015-1031.

[ref5] Aydın, B., Algina, J., Leite, W. L., & Atılgan, H. (2018). Sosyal bilimler için r’a giriş. Ankara: Anı Yayıncılık.

[ref6] Aznar-Mas, L. E., Atarés Huerta, L., & Marin-Garcia, J. A. (2023). Effectiveness of the use of open-ended questions in student evaluation of teaching in an engineering degree. Journal of Industrial Engineering and Management, 16(3), 521. doi:10.3926/jiem.5620

[ref7] Baburajan, V., de Abreu e Silva, J., & Pereira, F. C. (2022). Open vs closed-ended questions in attitudinal surveys-comparing, combining, and interpreting using natural language processing. Transportation Research. Part C, Emerging Technologies, 137(12), 103589. doi:10.1016/j.trc.2022.103589

[ref8] Badger, E., & Thomas, B. (2019). Open-ended questions in reading. Practical Assessment, Research, and Evaluation, 3(1), 4.

[ref9] Baykul, Y. & Turgut, M. F. (2012). Eğitimde ölçme ve değerlendirme. Ankara: Pegem Akademi Yayıncılık.

[ref10] Beiting-Parrish, M., & Whitmer, J. (2023). Lessons learned about evaluating fairness from a data challenge to automatically score NAEP reading items. Chinese/English Journal of Educational Measurement and Evaluation, 4(3). doi:10.59863/nkcj9608

[ref11] Beksultanova, A. I., Vatyukova, O. Y., & Yalmaeva, M. A. (2020). Application of digital technologies in the educational process. Proceedings of the 2nd International Scientific and Practical Conference on Digital Economy (ISCDE 2020).

[ref12] Brookhart, S. M. (2010). How to assess higher-order thinking skills in your classroom. ASCD.

[ref13] Bui, N. M., & Barrot, J. S. (2024). ChatGPT as an automated essay scoring tool in the writing classrooms: how it compares with human scoring. Education and Information Technologies. doi:10.1007/s10639-024-12891-w

[ref14] Byrne, B. M. (2010). Structural equation modeling with AMOS: Basic concepts, applications, and programming. New York: Routledge.

[ref15] Chen, L., Chen, P., & Lin, Z. (2020). Artificial intelligence in education: A review. Ieee Access, 8, 75264-75278.

[ref16] Cohen, J. (1992). Statistical power analysis. Current Directions in Psychological Science, 1(3), 98-101. doi:10.1111/1467-8721.ep10768783

[ref17] Demir, S. (2023). Investigation of ChatGPT and real raters in scoring open-ended items in terms of inter-rater reliability. Uluslararası Türk Eğitim Bilimleri Dergisi, 2023(21), 1072-1099. doi:10.46778/goputeb.1345752

[ref18] Doğan, N. (2019). Eğitimde ölçme ve değerlendirme. Ankara: Pegem Akademi Yayınevi.

[ref19] Fernandez, N., Ghosh, A., Liu, N., Wang, Z., Choffin, B., Baraniuk, R., & Lan, A. (2022). Automated scoring for reading comprehension via in-context bert tuning. In M. M. Rodrigo, N. Matsuda, A. I. Cristea, & V. Dimitrova (Eds.), International Conference on Artificial Intelligence in Education (pp. 691-697). Cham: Springer International Publishing.

[ref20] Fitriyah, Y., Wahyudin, Suhendra, Nurhayati, H., & Febrianti, T. S. (2024). Open-ended approach for critical thinking skills in mathematics education: A meta-analysis. EduMatSains: Jurnal Pendidikan, Matematika Dan Sains, 9(1), 156-174. doi:10.33541/edumatsains.v9i1.5975

[ref21] Freedman, R. L. H. (1994). Open-ended questioning: A handbook for educators. Boston: Addison-Wesley.

[ref22] Gao, R., Merzdorf, H. E., Anwar, S., Hipwell, M. C., & Srinivasa, A. R. (2024). Automatic assessment of text-based responses in post-secondary education: A systematic review. Computers and Education: Artificial Intelligence, 6, 100206. doi:10.1016/j.caeai.2024.100206

[ref23] Geer, J. G. (1988). What do open-ended questions measure?. Public Opinion Quarterly, 52(3), 365-367.

[ref24] Güler, N. (2014). Analysis of open-ended statistics questions with many facet Rasch model. Eurasian Journal of Educational Research, 55, 73-90. doi:10.14689/ejer.2014.55.5

[ref25] Hair, J., Black, W. C., Babin, B. J., & Anderson, R. E. (2010). Multivariate data analysis (7th ed.). Upper Saddle River, NJ: Pearson Educational International.

[ref26] Hogan, T. P., & Murphy, G. (2007). Recommendations for preparing and scoring constructed-response items: What the experts say. Applied Measurement in Education, 20(4), 427-441. doi:10.1080/08957340701580736

[ref27] Jamil, F., & Hameed, I. A. (2023). Toward intelligent open-ended questions evaluation based on predictive optimization. Expert Systems with Applications, 231, 120640. doi:10.1016/j.eswa.2023.120640

[ref28] Jescovitch, L. N., Scott, E. E., Cerchiara, J. A., Merrill, J., Urban-Lurain, M., Doherty, J. H., & Haudek, K. C. (2021). Comparison of machine learning performance using analytic and holistic coding approaches across constructed response assessments aligned to a science learning progression. Journal of Science Education and Technology, 30(2), 150-167. doi:10.1007/s10956-020-09858-0

[ref29] Jukiewicz, M. (2024). The future of grading programming assignments in education: The role of ChatGPT in automating the assessment and feedback process. Thinking Skills and Creativity, 52, 101522. doi:10.1016/j.tsc.2024.101522

[ref30] Karadag, N., Boz Yuksekdag, B., Akyildiz, M., & Ibileme, A. I. (2020). Assessment and evaluation in open education system: Students’ opinions about Open-Ended Question (OEQ) practice. Turkish Online Journal of Distance Education, 22(1), 179-193. doi:10.17718/tojde.849903

[ref31] Karakaya, İ. (2022). Açık uçlu soruların hazırlanması, uygulanması ve değerlendirilmesi. Ankara: Pegem Yayınları.

[ref32] Karasar, N. (2012). Bilimsel araştırma yöntemi (24. bs.). Ankara: Nobel Yayın Dağıtım.

[ref33] Karimi, L. (2014). The effect of constructed-responses and multiple-choice tests on students’ course content mastery. Southern African Linguistics and Applied Language Studies, 32(3), 365-372. doi:10.2989/16073614.2014.997067

[ref34] Kartikasari, S. A., Usodo B., & Riyadi (2022). The effectiveness open-ended learning and creative problem solving models to teach creative thinking skills. Pegem Journal of Education and Instruction, 12(4), 29-38. doi:10.47750/pegegog.12.04.04

[ref35] Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33(1), 159-174.

[ref36] Lin, Y., Zheng, L., Chen, F., Sun, S., Lin, Z., & Chen, P. (2020). Design and implementation of intelligent scoring system for handwritten short answer based on deep learning. IEEE International Conference on Artificial Intelligence and Information Systems (ICAIIS), Dalian, China. doi:10.1109/ICAIIS49377.2020.9194943

[ref37] Lohman, D. F. (1993). Learning and the nature of educational measurement. NASSP Bulletin, 77(555), 41-53. doi:10.1177/019263659307755506

[ref38] Lu, M., Zhou, W., & Ji, R. (2021). Automatic scoring system for handwritten examination papers based on YOLO algorithm. Journal of Physics: Conference Series, 2026. doi:10.1088/1742-6596/2026/1/012030

[ref39] Maris, G., & Bechger, T. (2006). Scoring open ended questions. In Handbook of statistics (pp. 663-681). Hollanda: Elsevier.

[ref40] Mizumoto, A., & Eguchi, M. (2023). Exploring the potential of using an AI language model for automated essay scoring. Research Methods in Applied Linguistics, 2(2), 100050. doi:10.1016/j.rmal.2023.100050

[ref41] Monrat, N., Phaksunchai, M., & Chonchaiya, R. (2022). Developing students’ mathematical critical thinking skills using open-ended questions and activities based on student learning preferences. Education Research International, 2022, 1-11. doi:10.1155/2022/3300363

[ref42] Owan, V. J., Abang, K. B., Idika, D. O., Etta, E. O., & Bassey, B. A. (2023). Exploring the potential of artificial intelligence tools in educational measurement and assessment. Eurasia Journal of Mathematics, Science and Technology Education, 19(8), em2307. doi:10.29333/ejmste/13428

[ref43] Parker, J. L., Becker, K., & Carroca, C. (2023). ChatGPT for automated writing evaluation in scholarly writing instruction. Journal of Nursing Education, 62(12), 721-727.

[ref44] Patton, M. Q. (2002). Qualitative research and evaluation methods. Thousand Oaks: CA: Sage Publications.

[ref45] Pinto, G., Cardoso-Pereira, I., Ribeiro, D. M., Lucena, D., de Souza, A., & Gama, K. (2023). Large language models for rducation: Grading open-ended questions using ChatGPT. arXiv. doi:10.48550/ARXIV.2307.16696

[ref46] Poole, F. J., & Coss, M. D. (2024). Can ChatGPT reliably and accurately apply a rubric to L2 writing assessments? The devil is in the prompt(s). Journal of Technology & Chinese Language Teaching, 15(1).

[ref47] Ramineni, C., & Williamson, D. (2018). Understanding mean score differences between the e‐rater® automated scoring engine and humans for demographically based groups in the GRE® General Test. ETS Research Report Series, 2018(1), 1-31. doi:10.1002/ets2.12192

[ref48] Quah, B., Zheng, L., Sng, T. J. H., Yong, C. W., & Islam, I. (2024). Reliability of ChatGPT in automated essay scoring for dental undergraduate examinations. BMC Medical Education, 24(1). doi:10.1186/s12909-024-05881-6

[ref49] Sarwanto, F., Widi, L. E., & Chumdari. (2021). Open-Ended Questions to Assess CriticalThinking Skills in Indonesian Elementary School. International Journal of Instruction, 14(1), 615-630. doi:10.29333/iji.2021.14137a

[ref50] Senkivska, L. (2022). The role of digital technologies in education. Journal of Education, Health and Sport, 12(1), 419-423. doi:10.12775/jehs.2022.12.01.036

[ref51] Septiani, S., Retnawati, H., & Arliani, E. (2022). Designing closed-ended questions into open-ended questions to support student’s creative thinking skills and mathematical communication skills. JTAM (Jurnal Teori Dan Aplikasi Matematika), 6(3), 616. doi:10.31764/jtam.v6i3.8517

[ref52] Suherman, S., & Vidákovich, T. (2022). Assessment of mathematical creative thinking: A systematic review. Thinking Skills and Creativity, 44, 101019. doi:10.1016/j.tsc.2022.101019

[ref53] Sychev, O., Anikin, A., & Prokudin, A. (2020). Automatic grading and hinting in open-ended text questions. Cognitive Systems Research, 59, 264-272. doi:10.1016/j.cogsys.2019.09.025

[ref54] Uysal, İ., & Doğan, N. (2021). How reliable is it to automatically score open-ended items? An application in the Turkish language. Egitimde ve Psikolojide Olçme ve Degerlendirme Dergisi, 12(1), 28-53. doi:10.21031/epod.817396

[ref55] von Davier, M., Tyack, L., & Khorramdel, L. (2022). Automated scoring of graphical open-ended responses using artificial neural networks. arXiv. doi:10.48550/arXiv.2201.01783

[ref56] Winarso, W., & Hardyanti, P. (2019). Using the learning of reciprocal teaching based on open ended to improve mathematical critical thinking ability. EduMa: Mathematics Education Learning and Teaching, 8(1). doi:10.24235/eduma.v8i1.4632

[ref57] Xiao, C., Ma, W., Song, Q., Xu, S. X., Zhang, K., Wang, Y., & Fu, Q. (2024). Human-AI collaborative essay scoring: A dual-process framework with LLMs. arXiv. doi:10.48550/arXiv.2401.06431

[ref58] Yaneva, V., Baldwin, P., Jurich, D. P., Swygert, K., & Clauser, B. E. (2023). Examining ChatGPT performance on USMLE sample items and implications for assessment. Academic Medicine, 99(2), 192-197.

[ref59] Zesch, T., Horbach, A., & Zehner, F. (2023). To score or not to score: Factors influencing performance and feasibility of automatic content scoring of text responses. Educational Measurement Issues and Practice, 42(1), 44-58. doi:10.1111/emip.12544

[ref60] Zhang, D., & Yuan, X. (2022). Intelligent scoring of English composition by machine learning from the perspective of natural language processing. Mathematical Problems in Engineering, 2022, 1-9. doi:10.1155/2022/9070272

Education and Science

Examining the Performance of Artificial Intelligence in Scoring Students' Handwritten Responses to Open-Ended Items

Authors

Abstract

References

Copyright and license

How to cite