Examining the Performance of Artificial Intelligence in Scoring Students' Handwritten Responses to Open-Ended Items

Mahmut Yiğiter; Erdem Boduroğlu

doi:10.15390/EB.2025.14119

Öz

Öğrenci başarılarının değerlendirilmesinde yüzyıllardır bir ölçme yöntemi olarak kullanılan açık uçlu maddeler, üst düzey becerilerin ölçülmesi, öğrenci hakkında zengin tanısal bilgi sağlaması, şans başarısının olmaması gibi pek çok avantaja sahiptir. Fakat günümüzde açık uçlu maddeler, puanlama işlemine hata karışabilmesi ve emek, zaman ve para açılarından dezavantajlı olması sebebiyle fazla sayıda öğrencinin katıldığı sınavlarda kullanılamamaktadır. Bu noktada Yapay Zekâ (YZ) açık uçlu maddelerin puanlanmasında önemli bir potansiyel içermektedir. Bu çalışmanın amacı, öğrencilerin açık uçlu maddelere el yazısıyla verdiği yanıtların puanlanmasında YZ’nin puanlama performansını incelemektir. Araştırmada bir devlet üniversitesinde Eğitimde Ölçme ve Değerlendirme dersi kapsamında 3 açık uçlu ve 10 çoktan seçmeli maddeden oluşan bir başarı testi geliştirilmiştir. Açık uçlu maddeler yanıtı yapılandırılmış biçimde (0-1-2) puanlanırken, çoktan seçmeli maddeler doğru-yanlış (0-1) şeklinde puanlanmıştır. 84 katılımcının yer aldığı çalışmada yer alan açık uçlu maddeler uzman grubu ve YZ aracı (ChatGPT-4o) tarafından puanlanmıştır. YZ aracına öğrencilerin el yazıları ile yazdıkları görsel yanıtlar iki farklı senaryoda puanlatılmıştır. Birinci senaryoda YZ’ye herhangi bir puanlama ölçütü verilmeden YZ aracının puanlama yapması istenirken, ikinci senaryoda standart puanlama ölçütlerine göre YZ’den puanlama yapması istenmiştir. Araştırmanın bulguları, YZ ile ölçütsüz puanlar ile uzman puanları arasında düşük uyum ve ilişki katsayıları olduğunu gösterirken, YZ ile standart ölçütle puanlama ve uzman puanlamaları arasında yüksek uyum ve ilişki katsayıları olduğu görülmüştür. Bu bulgulara benzer şekilde, YZ ile ölçütsüz puanlamanın madde ayırt edicilikleri oldukça düşük iken, YZ ile standart ölçütle puanlamanın madde ayırt edicilikleri yüksektir. Araştırmada ayrıca uzman puanları ve YZ ile standart ölçütlü puanları arasındaki uyumsuzlukların nedenleri de araştırılmış ve raporlanmıştır. Sonuçlar, YZ’nin standart puanlama ölçütleriyle el yazısıyla yanıtlanmış açık uçlu maddeleri iyi düzeyde puanlayabildiğini göstermektedir. Gelecekte YZ'nin gelişim ve dönüşümüyle birlikte tutarlılık açısından uzman puanlayıcılarla karşılaştırılabilir puanlama doğruluğuna ulaşabileceği düşünülmektedir.

Anahtar Kelimeler: Açık uçlu madde, Yapay zekâ, YZ, ChatGPT, Otomatik puanlama, El yazısı yanıtlar, Yapılandırılmış yanıtlı madde

Kaynakça

Abdolreza Gharehbagh, Z., Mansourzadeh, A., Montazeri Khadem, A., & Saeidi, M. (2022). Reflections on using open-ended questions. Medical Education Bulletin, 3(2), 475-482.
Agustianingsih, R., & Mahmudi, A. (2019). How to design open-ended questions?: Literature review. Journal of Physics: Conference Series, 1320(1). doi:10.1088/1742-6596/1320/1/012003
Alers, H., Malinowska, A., Meghoe, G., & Apfel, E. (2024). Using ChatGPT-4 to grade open question exams. In K. Arai (Ed.), Advances in information and communication (pp. 1-9). Switzerland: Springer Nature. doi:10.1007/978-3-031-53960-2_1.
Almusharraf, N., & Alotaibi, H. (2023). An error-analysis study from an EFL writing context: Human and automated essay scoring approaches. Technology Knowledge and Learning, 28(3), 1015-1031.
Aydın, B., Algina, J., Leite, W. L., & Atılgan, H. (2018). Sosyal bilimler için r’a giriş. Ankara: Anı Yayıncılık.
Aznar-Mas, L. E., Atarés Huerta, L., & Marin-Garcia, J. A. (2023). Effectiveness of the use of open-ended questions in student evaluation of teaching in an engineering degree. Journal of Industrial Engineering and Management, 16(3), 521. doi:10.3926/jiem.5620
Baburajan, V., de Abreu e Silva, J., & Pereira, F. C. (2022). Open vs closed-ended questions in attitudinal surveys-comparing, combining, and interpreting using natural language processing. Transportation Research. Part C, Emerging Technologies, 137(12), 103589. doi:10.1016/j.trc.2022.103589
Badger, E., & Thomas, B. (2019). Open-ended questions in reading. Practical Assessment, Research, and Evaluation, 3(1), 4.
Baykul, Y. & Turgut, M. F. (2012). Eğitimde ölçme ve değerlendirme. Ankara: Pegem Akademi Yayıncılık.
Beiting-Parrish, M., & Whitmer, J. (2023). Lessons learned about evaluating fairness from a data challenge to automatically score NAEP reading items. Chinese/English Journal of Educational Measurement and Evaluation, 4(3). doi:10.59863/nkcj9608
Beksultanova, A. I., Vatyukova, O. Y., & Yalmaeva, M. A. (2020). Application of digital technologies in the educational process. Proceedings of the 2nd International Scientific and Practical Conference on Digital Economy (ISCDE 2020).
Brookhart, S. M. (2010). How to assess higher-order thinking skills in your classroom. ASCD.
Bui, N. M., & Barrot, J. S. (2024). ChatGPT as an automated essay scoring tool in the writing classrooms: how it compares with human scoring. Education and Information Technologies. doi:10.1007/s10639-024-12891-w
Byrne, B. M. (2010). Structural equation modeling with AMOS: Basic concepts, applications, and programming. New York: Routledge.
Chen, L., Chen, P., & Lin, Z. (2020). Artificial intelligence in education: A review. Ieee Access, 8, 75264-75278.
Cohen, J. (1992). Statistical power analysis. Current Directions in Psychological Science, 1(3), 98-101. doi:10.1111/1467-8721.ep10768783
Demir, S. (2023). Investigation of ChatGPT and real raters in scoring open-ended items in terms of inter-rater reliability. Uluslararası Türk Eğitim Bilimleri Dergisi, 2023(21), 1072-1099. doi:10.46778/goputeb.1345752
Doğan, N. (2019). Eğitimde ölçme ve değerlendirme. Ankara: Pegem Akademi Yayınevi.
Fernandez, N., Ghosh, A., Liu, N., Wang, Z., Choffin, B., Baraniuk, R., & Lan, A. (2022). Automated scoring for reading comprehension via in-context bert tuning. In M. M. Rodrigo, N. Matsuda, A. I. Cristea, & V. Dimitrova (Eds.), International Conference on Artificial Intelligence in Education (pp. 691-697). Cham: Springer International Publishing.
Fitriyah, Y., Wahyudin, Suhendra, Nurhayati, H., & Febrianti, T. S. (2024). Open-ended approach for critical thinking skills in mathematics education: A meta-analysis. EduMatSains: Jurnal Pendidikan, Matematika Dan Sains, 9(1), 156-174. doi:10.33541/edumatsains.v9i1.5975
Freedman, R. L. H. (1994). Open-ended questioning: A handbook for educators. Boston: Addison-Wesley.
Gao, R., Merzdorf, H. E., Anwar, S., Hipwell, M. C., & Srinivasa, A. R. (2024). Automatic assessment of text-based responses in post-secondary education: A systematic review. Computers and Education: Artificial Intelligence, 6, 100206. doi:10.1016/j.caeai.2024.100206
Geer, J. G. (1988). What do open-ended questions measure?. Public Opinion Quarterly, 52(3), 365-367.
Güler, N. (2014). Analysis of open-ended statistics questions with many facet Rasch model. Eurasian Journal of Educational Research, 55, 73-90. doi:10.14689/ejer.2014.55.5
Hair, J., Black, W. C., Babin, B. J., & Anderson, R. E. (2010). Multivariate data analysis (7th ed.). Upper Saddle River, NJ: Pearson Educational International.
Hogan, T. P., & Murphy, G. (2007). Recommendations for preparing and scoring constructed-response items: What the experts say. Applied Measurement in Education, 20(4), 427-441. doi:10.1080/08957340701580736
Jamil, F., & Hameed, I. A. (2023). Toward intelligent open-ended questions evaluation based on predictive optimization. Expert Systems with Applications, 231, 120640. doi:10.1016/j.eswa.2023.120640
Jescovitch, L. N., Scott, E. E., Cerchiara, J. A., Merrill, J., Urban-Lurain, M., Doherty, J. H., & Haudek, K. C. (2021). Comparison of machine learning performance using analytic and holistic coding approaches across constructed response assessments aligned to a science learning progression. Journal of Science Education and Technology, 30(2), 150-167. doi:10.1007/s10956-020-09858-0
Jukiewicz, M. (2024). The future of grading programming assignments in education: The role of ChatGPT in automating the assessment and feedback process. Thinking Skills and Creativity, 52, 101522. doi:10.1016/j.tsc.2024.101522
Karadag, N., Boz Yuksekdag, B., Akyildiz, M., & Ibileme, A. I. (2020). Assessment and evaluation in open education system: Students’ opinions about Open-Ended Question (OEQ) practice. Turkish Online Journal of Distance Education, 22(1), 179-193. doi:10.17718/tojde.849903
Karakaya, İ. (2022). Açık uçlu soruların hazırlanması, uygulanması ve değerlendirilmesi. Ankara: Pegem Yayınları.
Karasar, N. (2012). Bilimsel araştırma yöntemi (24. bs.). Ankara: Nobel Yayın Dağıtım.
Karimi, L. (2014). The effect of constructed-responses and multiple-choice tests on students’ course content mastery. Southern African Linguistics and Applied Language Studies, 32(3), 365-372. doi:10.2989/16073614.2014.997067
Kartikasari, S. A., Usodo B., & Riyadi (2022). The effectiveness open-ended learning and creative problem solving models to teach creative thinking skills. Pegem Journal of Education and Instruction, 12(4), 29-38. doi:10.47750/pegegog.12.04.04
Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33(1), 159-174.
Lin, Y., Zheng, L., Chen, F., Sun, S., Lin, Z., & Chen, P. (2020). Design and implementation of intelligent scoring system for handwritten short answer based on deep learning. IEEE International Conference on Artificial Intelligence and Information Systems (ICAIIS), Dalian, China. doi:10.1109/ICAIIS49377.2020.9194943
Lohman, D. F. (1993). Learning and the nature of educational measurement. NASSP Bulletin, 77(555), 41-53. doi:10.1177/019263659307755506
Lu, M., Zhou, W., & Ji, R. (2021). Automatic scoring system for handwritten examination papers based on YOLO algorithm. Journal of Physics: Conference Series, 2026. doi:10.1088/1742-6596/2026/1/012030
Maris, G., & Bechger, T. (2006). Scoring open ended questions. In Handbook of statistics (pp. 663-681). Hollanda: Elsevier.
Mizumoto, A., & Eguchi, M. (2023). Exploring the potential of using an AI language model for automated essay scoring. Research Methods in Applied Linguistics, 2(2), 100050. doi:10.1016/j.rmal.2023.100050
Monrat, N., Phaksunchai, M., & Chonchaiya, R. (2022). Developing students’ mathematical critical thinking skills using open-ended questions and activities based on student learning preferences. Education Research International, 2022, 1-11. doi:10.1155/2022/3300363
Owan, V. J., Abang, K. B., Idika, D. O., Etta, E. O., & Bassey, B. A. (2023). Exploring the potential of artificial intelligence tools in educational measurement and assessment. Eurasia Journal of Mathematics, Science and Technology Education, 19(8), em2307. doi:10.29333/ejmste/13428
Parker, J. L., Becker, K., & Carroca, C. (2023). ChatGPT for automated writing evaluation in scholarly writing instruction. Journal of Nursing Education, 62(12), 721-727.
Patton, M. Q. (2002). Qualitative research and evaluation methods. Thousand Oaks: CA: Sage Publications.
Pinto, G., Cardoso-Pereira, I., Ribeiro, D. M., Lucena, D., de Souza, A., & Gama, K. (2023). Large language models for rducation: Grading open-ended questions using ChatGPT. arXiv. doi:10.48550/ARXIV.2307.16696
Poole, F. J., & Coss, M. D. (2024). Can ChatGPT reliably and accurately apply a rubric to L2 writing assessments? The devil is in the prompt(s). Journal of Technology & Chinese Language Teaching, 15(1).
Ramineni, C., & Williamson, D. (2018). Understanding mean score differences between the e‐rater® automated scoring engine and humans for demographically based groups in the GRE® General Test. ETS Research Report Series, 2018(1), 1-31. doi:10.1002/ets2.12192
Quah, B., Zheng, L., Sng, T. J. H., Yong, C. W., & Islam, I. (2024). Reliability of ChatGPT in automated essay scoring for dental undergraduate examinations. BMC Medical Education, 24(1). doi:10.1186/s12909-024-05881-6
Sarwanto, F., Widi, L. E., & Chumdari. (2021). Open-Ended Questions to Assess CriticalThinking Skills in Indonesian Elementary School. International Journal of Instruction, 14(1), 615-630. doi:10.29333/iji.2021.14137a
Senkivska, L. (2022). The role of digital technologies in education. Journal of Education, Health and Sport, 12(1), 419-423. doi:10.12775/jehs.2022.12.01.036
Septiani, S., Retnawati, H., & Arliani, E. (2022). Designing closed-ended questions into open-ended questions to support student’s creative thinking skills and mathematical communication skills. JTAM (Jurnal Teori Dan Aplikasi Matematika), 6(3), 616. doi:10.31764/jtam.v6i3.8517
Suherman, S., & Vidákovich, T. (2022). Assessment of mathematical creative thinking: A systematic review. Thinking Skills and Creativity, 44, 101019. doi:10.1016/j.tsc.2022.101019
Sychev, O., Anikin, A., & Prokudin, A. (2020). Automatic grading and hinting in open-ended text questions. Cognitive Systems Research, 59, 264-272. doi:10.1016/j.cogsys.2019.09.025
Uysal, İ., & Doğan, N. (2021). How reliable is it to automatically score open-ended items? An application in the Turkish language. Egitimde ve Psikolojide Olçme ve Degerlendirme Dergisi, 12(1), 28-53. doi:10.21031/epod.817396
von Davier, M., Tyack, L., & Khorramdel, L. (2022). Automated scoring of graphical open-ended responses using artificial neural networks. arXiv. doi:10.48550/arXiv.2201.01783
Winarso, W., & Hardyanti, P. (2019). Using the learning of reciprocal teaching based on open ended to improve mathematical critical thinking ability. EduMa: Mathematics Education Learning and Teaching, 8(1). doi:10.24235/eduma.v8i1.4632
Xiao, C., Ma, W., Song, Q., Xu, S. X., Zhang, K., Wang, Y., & Fu, Q. (2024). Human-AI collaborative essay scoring: A dual-process framework with LLMs. arXiv. doi:10.48550/arXiv.2401.06431
Yaneva, V., Baldwin, P., Jurich, D. P., Swygert, K., & Clauser, B. E. (2023). Examining ChatGPT performance on USMLE sample items and implications for assessment. Academic Medicine, 99(2), 192-197.
Zesch, T., Horbach, A., & Zehner, F. (2023). To score or not to score: Factors influencing performance and feasibility of automatic content scoring of text responses. Educational Measurement Issues and Practice, 42(1), 44-58. doi:10.1111/emip.12544
Zhang, D., & Yuan, X. (2022). Intelligent scoring of English composition by machine learning from the perspective of natural language processing. Mathematical Problems in Engineering, 2022, 1-9. doi:10.1155/2022/9070272

Telif hakkı ve lisans

Telif Hakkı © 2025 Yazar(lar). Açık erişimli bu makale, orijinal çalışmaya uygun şekilde atıfta bulunulması koşuluyla, herhangi bir ortamda veya formatta sınırsız kullanım, dağıtım ve çoğaltmaya izin veren Creative Commons Atıf Lisansı (CC BY) altında dağıtılmıştır.

Nasıl atıf yapılır

Yiğiter, M., & Boduroğlu, E. (2025). Öğrencilerin El Yazısıyla Yanıtladığı Açık Uçlu Maddelerin Puanlanmasında Yapay Zekâ Performansının İncelenmesi. Eğitim Ve Bilim, 50, 1-18. https://doi.org/10.15390/EB.2025.14119

Atıf biçimi indir

[ref1] Abdolreza Gharehbagh, Z., Mansourzadeh, A., Montazeri Khadem, A., & Saeidi, M. (2022). Reflections on using open-ended questions. Medical Education Bulletin, 3(2), 475-482.

[ref2] Agustianingsih, R., & Mahmudi, A. (2019). How to design open-ended questions?: Literature review. Journal of Physics: Conference Series, 1320(1). doi:10.1088/1742-6596/1320/1/012003

[ref3] Alers, H., Malinowska, A., Meghoe, G., & Apfel, E. (2024). Using ChatGPT-4 to grade open question exams. In K. Arai (Ed.), Advances in information and communication (pp. 1-9). Switzerland: Springer Nature. doi:10.1007/978-3-031-53960-2_1.

[ref4] Almusharraf, N., & Alotaibi, H. (2023). An error-analysis study from an EFL writing context: Human and automated essay scoring approaches. Technology Knowledge and Learning, 28(3), 1015-1031.

[ref5] Aydın, B., Algina, J., Leite, W. L., & Atılgan, H. (2018). Sosyal bilimler için r’a giriş. Ankara: Anı Yayıncılık.

[ref6] Aznar-Mas, L. E., Atarés Huerta, L., & Marin-Garcia, J. A. (2023). Effectiveness of the use of open-ended questions in student evaluation of teaching in an engineering degree. Journal of Industrial Engineering and Management, 16(3), 521. doi:10.3926/jiem.5620

[ref7] Baburajan, V., de Abreu e Silva, J., & Pereira, F. C. (2022). Open vs closed-ended questions in attitudinal surveys-comparing, combining, and interpreting using natural language processing. Transportation Research. Part C, Emerging Technologies, 137(12), 103589. doi:10.1016/j.trc.2022.103589

[ref8] Badger, E., & Thomas, B. (2019). Open-ended questions in reading. Practical Assessment, Research, and Evaluation, 3(1), 4.

[ref9] Baykul, Y. & Turgut, M. F. (2012). Eğitimde ölçme ve değerlendirme. Ankara: Pegem Akademi Yayıncılık.

[ref10] Beiting-Parrish, M., & Whitmer, J. (2023). Lessons learned about evaluating fairness from a data challenge to automatically score NAEP reading items. Chinese/English Journal of Educational Measurement and Evaluation, 4(3). doi:10.59863/nkcj9608

[ref11] Beksultanova, A. I., Vatyukova, O. Y., & Yalmaeva, M. A. (2020). Application of digital technologies in the educational process. Proceedings of the 2nd International Scientific and Practical Conference on Digital Economy (ISCDE 2020).

[ref12] Brookhart, S. M. (2010). How to assess higher-order thinking skills in your classroom. ASCD.

[ref13] Bui, N. M., & Barrot, J. S. (2024). ChatGPT as an automated essay scoring tool in the writing classrooms: how it compares with human scoring. Education and Information Technologies. doi:10.1007/s10639-024-12891-w

[ref14] Byrne, B. M. (2010). Structural equation modeling with AMOS: Basic concepts, applications, and programming. New York: Routledge.

[ref15] Chen, L., Chen, P., & Lin, Z. (2020). Artificial intelligence in education: A review. Ieee Access, 8, 75264-75278.

[ref16] Cohen, J. (1992). Statistical power analysis. Current Directions in Psychological Science, 1(3), 98-101. doi:10.1111/1467-8721.ep10768783

[ref17] Demir, S. (2023). Investigation of ChatGPT and real raters in scoring open-ended items in terms of inter-rater reliability. Uluslararası Türk Eğitim Bilimleri Dergisi, 2023(21), 1072-1099. doi:10.46778/goputeb.1345752

[ref18] Doğan, N. (2019). Eğitimde ölçme ve değerlendirme. Ankara: Pegem Akademi Yayınevi.

[ref19] Fernandez, N., Ghosh, A., Liu, N., Wang, Z., Choffin, B., Baraniuk, R., & Lan, A. (2022). Automated scoring for reading comprehension via in-context bert tuning. In M. M. Rodrigo, N. Matsuda, A. I. Cristea, & V. Dimitrova (Eds.), International Conference on Artificial Intelligence in Education (pp. 691-697). Cham: Springer International Publishing.

[ref20] Fitriyah, Y., Wahyudin, Suhendra, Nurhayati, H., & Febrianti, T. S. (2024). Open-ended approach for critical thinking skills in mathematics education: A meta-analysis. EduMatSains: Jurnal Pendidikan, Matematika Dan Sains, 9(1), 156-174. doi:10.33541/edumatsains.v9i1.5975

[ref21] Freedman, R. L. H. (1994). Open-ended questioning: A handbook for educators. Boston: Addison-Wesley.

[ref22] Gao, R., Merzdorf, H. E., Anwar, S., Hipwell, M. C., & Srinivasa, A. R. (2024). Automatic assessment of text-based responses in post-secondary education: A systematic review. Computers and Education: Artificial Intelligence, 6, 100206. doi:10.1016/j.caeai.2024.100206

[ref23] Geer, J. G. (1988). What do open-ended questions measure?. Public Opinion Quarterly, 52(3), 365-367.

[ref24] Güler, N. (2014). Analysis of open-ended statistics questions with many facet Rasch model. Eurasian Journal of Educational Research, 55, 73-90. doi:10.14689/ejer.2014.55.5

[ref25] Hair, J., Black, W. C., Babin, B. J., & Anderson, R. E. (2010). Multivariate data analysis (7th ed.). Upper Saddle River, NJ: Pearson Educational International.

[ref26] Hogan, T. P., & Murphy, G. (2007). Recommendations for preparing and scoring constructed-response items: What the experts say. Applied Measurement in Education, 20(4), 427-441. doi:10.1080/08957340701580736

[ref27] Jamil, F., & Hameed, I. A. (2023). Toward intelligent open-ended questions evaluation based on predictive optimization. Expert Systems with Applications, 231, 120640. doi:10.1016/j.eswa.2023.120640

[ref28] Jescovitch, L. N., Scott, E. E., Cerchiara, J. A., Merrill, J., Urban-Lurain, M., Doherty, J. H., & Haudek, K. C. (2021). Comparison of machine learning performance using analytic and holistic coding approaches across constructed response assessments aligned to a science learning progression. Journal of Science Education and Technology, 30(2), 150-167. doi:10.1007/s10956-020-09858-0

[ref29] Jukiewicz, M. (2024). The future of grading programming assignments in education: The role of ChatGPT in automating the assessment and feedback process. Thinking Skills and Creativity, 52, 101522. doi:10.1016/j.tsc.2024.101522

[ref30] Karadag, N., Boz Yuksekdag, B., Akyildiz, M., & Ibileme, A. I. (2020). Assessment and evaluation in open education system: Students’ opinions about Open-Ended Question (OEQ) practice. Turkish Online Journal of Distance Education, 22(1), 179-193. doi:10.17718/tojde.849903

[ref31] Karakaya, İ. (2022). Açık uçlu soruların hazırlanması, uygulanması ve değerlendirilmesi. Ankara: Pegem Yayınları.

[ref32] Karasar, N. (2012). Bilimsel araştırma yöntemi (24. bs.). Ankara: Nobel Yayın Dağıtım.

[ref33] Karimi, L. (2014). The effect of constructed-responses and multiple-choice tests on students’ course content mastery. Southern African Linguistics and Applied Language Studies, 32(3), 365-372. doi:10.2989/16073614.2014.997067

[ref34] Kartikasari, S. A., Usodo B., & Riyadi (2022). The effectiveness open-ended learning and creative problem solving models to teach creative thinking skills. Pegem Journal of Education and Instruction, 12(4), 29-38. doi:10.47750/pegegog.12.04.04

[ref35] Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33(1), 159-174.

[ref36] Lin, Y., Zheng, L., Chen, F., Sun, S., Lin, Z., & Chen, P. (2020). Design and implementation of intelligent scoring system for handwritten short answer based on deep learning. IEEE International Conference on Artificial Intelligence and Information Systems (ICAIIS), Dalian, China. doi:10.1109/ICAIIS49377.2020.9194943

[ref37] Lohman, D. F. (1993). Learning and the nature of educational measurement. NASSP Bulletin, 77(555), 41-53. doi:10.1177/019263659307755506

[ref38] Lu, M., Zhou, W., & Ji, R. (2021). Automatic scoring system for handwritten examination papers based on YOLO algorithm. Journal of Physics: Conference Series, 2026. doi:10.1088/1742-6596/2026/1/012030

[ref39] Maris, G., & Bechger, T. (2006). Scoring open ended questions. In Handbook of statistics (pp. 663-681). Hollanda: Elsevier.

[ref40] Mizumoto, A., & Eguchi, M. (2023). Exploring the potential of using an AI language model for automated essay scoring. Research Methods in Applied Linguistics, 2(2), 100050. doi:10.1016/j.rmal.2023.100050

[ref41] Monrat, N., Phaksunchai, M., & Chonchaiya, R. (2022). Developing students’ mathematical critical thinking skills using open-ended questions and activities based on student learning preferences. Education Research International, 2022, 1-11. doi:10.1155/2022/3300363

[ref42] Owan, V. J., Abang, K. B., Idika, D. O., Etta, E. O., & Bassey, B. A. (2023). Exploring the potential of artificial intelligence tools in educational measurement and assessment. Eurasia Journal of Mathematics, Science and Technology Education, 19(8), em2307. doi:10.29333/ejmste/13428

[ref43] Parker, J. L., Becker, K., & Carroca, C. (2023). ChatGPT for automated writing evaluation in scholarly writing instruction. Journal of Nursing Education, 62(12), 721-727.

[ref44] Patton, M. Q. (2002). Qualitative research and evaluation methods. Thousand Oaks: CA: Sage Publications.

[ref45] Pinto, G., Cardoso-Pereira, I., Ribeiro, D. M., Lucena, D., de Souza, A., & Gama, K. (2023). Large language models for rducation: Grading open-ended questions using ChatGPT. arXiv. doi:10.48550/ARXIV.2307.16696

[ref46] Poole, F. J., & Coss, M. D. (2024). Can ChatGPT reliably and accurately apply a rubric to L2 writing assessments? The devil is in the prompt(s). Journal of Technology & Chinese Language Teaching, 15(1).

[ref47] Ramineni, C., & Williamson, D. (2018). Understanding mean score differences between the e‐rater® automated scoring engine and humans for demographically based groups in the GRE® General Test. ETS Research Report Series, 2018(1), 1-31. doi:10.1002/ets2.12192

[ref48] Quah, B., Zheng, L., Sng, T. J. H., Yong, C. W., & Islam, I. (2024). Reliability of ChatGPT in automated essay scoring for dental undergraduate examinations. BMC Medical Education, 24(1). doi:10.1186/s12909-024-05881-6

[ref49] Sarwanto, F., Widi, L. E., & Chumdari. (2021). Open-Ended Questions to Assess CriticalThinking Skills in Indonesian Elementary School. International Journal of Instruction, 14(1), 615-630. doi:10.29333/iji.2021.14137a

[ref50] Senkivska, L. (2022). The role of digital technologies in education. Journal of Education, Health and Sport, 12(1), 419-423. doi:10.12775/jehs.2022.12.01.036

[ref51] Septiani, S., Retnawati, H., & Arliani, E. (2022). Designing closed-ended questions into open-ended questions to support student’s creative thinking skills and mathematical communication skills. JTAM (Jurnal Teori Dan Aplikasi Matematika), 6(3), 616. doi:10.31764/jtam.v6i3.8517

[ref52] Suherman, S., & Vidákovich, T. (2022). Assessment of mathematical creative thinking: A systematic review. Thinking Skills and Creativity, 44, 101019. doi:10.1016/j.tsc.2022.101019

[ref53] Sychev, O., Anikin, A., & Prokudin, A. (2020). Automatic grading and hinting in open-ended text questions. Cognitive Systems Research, 59, 264-272. doi:10.1016/j.cogsys.2019.09.025

[ref54] Uysal, İ., & Doğan, N. (2021). How reliable is it to automatically score open-ended items? An application in the Turkish language. Egitimde ve Psikolojide Olçme ve Degerlendirme Dergisi, 12(1), 28-53. doi:10.21031/epod.817396

[ref55] von Davier, M., Tyack, L., & Khorramdel, L. (2022). Automated scoring of graphical open-ended responses using artificial neural networks. arXiv. doi:10.48550/arXiv.2201.01783

[ref56] Winarso, W., & Hardyanti, P. (2019). Using the learning of reciprocal teaching based on open ended to improve mathematical critical thinking ability. EduMa: Mathematics Education Learning and Teaching, 8(1). doi:10.24235/eduma.v8i1.4632

[ref57] Xiao, C., Ma, W., Song, Q., Xu, S. X., Zhang, K., Wang, Y., & Fu, Q. (2024). Human-AI collaborative essay scoring: A dual-process framework with LLMs. arXiv. doi:10.48550/arXiv.2401.06431

[ref58] Yaneva, V., Baldwin, P., Jurich, D. P., Swygert, K., & Clauser, B. E. (2023). Examining ChatGPT performance on USMLE sample items and implications for assessment. Academic Medicine, 99(2), 192-197.

[ref59] Zesch, T., Horbach, A., & Zehner, F. (2023). To score or not to score: Factors influencing performance and feasibility of automatic content scoring of text responses. Educational Measurement Issues and Practice, 42(1), 44-58. doi:10.1111/emip.12544

[ref60] Zhang, D., & Yuan, X. (2022). Intelligent scoring of English composition by machine learning from the perspective of natural language processing. Mathematical Problems in Engineering, 2022, 1-9. doi:10.1155/2022/9070272

Eğitim ve Bilim

Öğrencilerin El Yazısıyla Yanıtladığı Açık Uçlu Maddelerin Puanlanmasında Yapay Zekâ Performansının İncelenmesi

Yazarlar

Öz

Kaynakça

Telif hakkı ve lisans

Nasıl atıf yapılır