Abstract

This study examined whether responses generated by chatbots (ChatGPT-3.5, ChatGPT-4, and Bard) about heat and temperature match misconceptions identified in the literature and how these responses compare to those of learners. The study also addressed the effect of Conceptual Change Texts (CCTs) on chatbot-generated responses about heat and temperature, focusing on their relevance to prompt engineering. Heat and Temperature Four-tier Misconception Test (HTMCT) and CCTs were utilized from a previous study that investigated the effectiveness of CCTs in remedying misconceptions about heat and temperature held by pre-service physics teachers. The HTMCT, consisting of 20 items, was designed to diagnose misconceptions about heat and temperature held by pre-service physics teachers as identified in the literature, with each misconception being assessed using multiple items. In this study, the HTMCT was used to diagnose the chatbots’ responses of the heat and temperature concepts before and after the implementation of CCTs. In addition, in-depth interviews with the chatbots were conducted to elaborate on their responses. Pre-service physics teachers in the prior study exhibited misconceptions about heat and temperature, which were effectively remediated by CCTs, leading to significant overall improvements. Similarly, this study found that chatbot-generated responses, except those from Bard, were prone to misconceptions. ChatGPT-4 consistently generated responses that aligned with the scientific paradigm, unlike the other two chatbots. However, pre- and post-test data revealed that ChatGPT-4-generated responses were prone to a misconception, specifically that equal amounts of heat supplied to different substances will result in the same final temperature, and these responses consistently reflected this misconception. Both ChatGPT-3.5 and Bard showed improved performance between the pre- and post-test data, despite providing inconsistent responses. While chatbots could generate responses that accurately expressed concept definitions, they struggled with drawing conclusions based on multiple scientific concepts, applying concepts to real-world scenarios, and engaging in complex reasoning. In this study, while the algorithms underlying the chatbots remain undisclosed, the post-test responses for all chatbots showed a notable decrease in incorrect responses and improved alignment with scientific knowledge, suggesting a positive influence of CCTs, akin to findings from the prior study.

Keywords: Science education, Artificial intelligence chatbots, Prompt engineering, Heat and temperature, Misconception, Conceptual change approach, Four-tier test

References

  1. Alasadi, E. A., & Baiz, C. R. (2023). Generative AI in education and research: Opportunities, concerns, and solutions. Journal of Chemical Education, 100(8), 2965-2971. https://doi.org/10.1021/acs.jchemed.3c00323
  2. Allchin, D. (2023). Ten competencies for the science misinformation crisis. Science Education, 107(2), 261-274. https://doi.org/10.1002/sce.21746
  3. Ausubel, D. P. (1968). Educational psychology: A cognitive view. Holt, Rinehart and Winston.
  4. Aydoğan, S. Güneş, B., & Gülçiçek, C. (2003). Isı ve sıcaklık konusunda kavram yanılgıları. Gazi Üniversitesi Gazi Eğitim Fakültesi Dergisi, 23(2), 111-124. https://dergipark.org.tr/en/pub/gefad/issue/6762/90969
  5. Baker, T., & Smith, L. (2023). Educ-AI-tion rebooted? Exploring the future of artificial intelligence in schools and colleges. Nesta. https://media.nesta.org.uk/documents/Future_of_AI_and_education_v5_WEB.pdf
  6. Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., ... Amodei, D. (2020). Language models are few-shot learners. arXiv. https://doi.org/10.48550/arXiv.2005.14165
  7. Caramancion, K. M. (2023). News verifiers showdown: A comparative performance evaluation of ChatGPT 3.5, ChatGPT 4.0, Bing AI, and Bard in news fact-checking. arXiv. https://doi.org/10.48550/arxiv.2306.17176
  8. Carey, S. (1999). Sources of conceptual change. In E. Scholnick, K. Nelson, S. Gelman, & P. Miller (Eds.), Conceptual development: Piaget’s legacy (pp. 293-326). Erlbaum.
  9. Chambers, S. K., & Andre, T. (1997). Gender, prior knowledge, interest, and experience in electricity and conceptual change text manipulations in learning about direct current. Journal of Research in Science Teaching, 34(2), 107-123. https://doi.org/10.1002/(SICI)1098-2736(199702)34:2<107::AID-TEA2>3.0.CO;2-X
  10. Champagne, A. B., Klopfer, L. E., & Gunstone, R. F. (1982). Cognitive research and the design of science instruction. Educational Psychology, 17(1), 31-53. https://doi.org/10.1080/00461528209529242
  11. Chen, Y., Jensen, S., Albert, L. J., Gupta, S., & Lee, T. (2023). Artificial intelligence (AI) student assistants in the classroom: Designing chatbots to support student success. Information Systems Frontiers, 25(1), 161-182. https://doi.org/10.1007/s10796-022-10291-4
  12. Chi, M. T. H. (1992). Conceptual change in and across ontological categories: Examples from learning and discovery in science. In R. Giere (Ed.), Cognitive models of science (pp. 129-177). University of Minnesota Press.
  13. Clark, T. M. (2023). Investigating the use of an artificial intelligence chatbot with general chemistry exam questions. Journal of Chemical Education, 100(5), 1905-1916. https://doi.org/10.1021/acs.jchemed.3c00027
  14. Clough, E. E., & Driver, R. (1986). A study of consistency in the use of students' conceptual frameworks across different task contexts. Science Education, 70(4), 473-96. https://doi.org/10.1002/sce.3730700412
  15. Çelik, A. K. (2022). Isı ve sıcaklık konusundaki kavram yanılgılarının iyileştirilmesinde kavramsal değişim metinlerinin etkisi. (Thesis No. 771280) [Master’s thesis, Gazi University]. Council of Higher Education National Thesis Center.
  16. diSessa, A. A. (1988). Knowledge in pieces. In G. Forman & P. Pufall (Eds.), Constructivism in the computer age (pp. 49-70). Lawrence Erlbaum Associates.
  17. diSessa, A. A. (2008). A bird’s-eye view of the “pieces” vs. “coherence” controversy (from the “pieces” side of the fence). In S. Vosniadou (Ed.), International handbook of research on conceptual change (pp. 35-60). Routledge.
  18. diSessa, A. A., Gillespie, N., & Esterly, J. (2004). Coherence versus fragmentation in the development of the concept of force. Cognitive Science, 28(6), 843-900. https://doi.org/10.1207/s15516709cog2806_1
  19. Driver, R., & Easley, J. (1978). Pupils and paradigms: A review of literature related to concept development in adolescent science students. Studies in Science Education, 5(1), 61-84. https://doi.org/10.1080/03057267808559857
  20. Driver, R., Squires, A., Rushworth, P., & Wood-Robinson, V. (1994). Making sense of secondary science: Research into children's ideas. Routledge. https://doi.org/10.4324/9780203823583
  21. Duit, R., & Treagust, D. F. (2003). Conceptual change: A powerful framework for improving science teaching and learning. International Journal of Science Education, 25(6), 671-688. https://doi.org/10.1080/09500690305016
  22. Ekin, S. (2023). Prompt engineering for ChatGPT: A quick guide to techniques, tips, and best practices. TechRxiv. https://doi.org/10.36227/techrxiv.22683919
  23. Exintaris, B., Karunaratne, N., & Yuriev, E. (2023). Metacognition and critical thinking: Using ChatGPT-generated responses as prompts for critique in a problem-solving workshop (SMARTCHEMPer). Journal of Chemical Education, 100(8), 2972-2980. https://doi.org/10.1021/acs.jchemed.3c00481
  24. Fergus, S., Botha, M., & Ostovar, M. (2023). Evaluating academic answers generated using ChatGPT. Journal of Chemical Education, 100(4), 1672-1675. https://doi.org/10.1021/acs.jchemed.3c00087
  25. Ford, M. J. (2012). A dialogic account of sense-making in scientific argumentation and reasoning. Cognition and Instruction, 30(3), 207-245. https://doi.org/10.1080/07370008.2012.689383
  26. Gao, T., Fisch, A., & Chen, D. (2020). Making pre-trained language models better few-shot learners. arXiv. https://doi.org/10.48550/arXiv.2012.15723
  27. Güneş, B. (2021). Fizikte kavram yanılgıları (2nd ed.). Palme.
  28. Güneş, F. (2020). Isı ve sıcaklık ile ilgili kavram yanılgılarını belirlemeye yönelik dört aşamalı bir testin geliştirilerek uygulanması (Thesis No. 624674) [Master’s thesis, Gazi University]. Council of Higher Education National Thesis Center.
  29. Hammer, D. (1996). Misconceptions or p-prims: How may alternative perspectives of cognitive structure influence instructional perceptions and intentions. The Journal of the Learning Sciences, 5(2), 97-127. https://doi.org/10.1207/s15327809jls0502_1
  30. Hestenes, D., & Halloun, I. (1995). Interpreting the force concept inventory. The Physics Teacher, 33(8), 504-506. https://doi.org/10.1119/1.2344278
  31. Hestenes, D., Wells, M., & Swackhamer, G. (1992). Force concept inventory. The Physics Teacher, 30(3), 141-158. https://doi.org/10.1119/1.2343497
  32. Humphry, T., & Fuller, A. L. (2023). Potential ChatGPT use in undergraduate chemistry laboratories. Journal of Chemical Education, 100(4), 1434-1436. https://doi.org/10.1021/acs.jchemed.3c00006
  33. Hunter, K. H., Rodriguez, J. M. G., & Becker, N. M. (2021). Making sense of sensemaking: Using the sensemaking epistemic game to investigate student discourse during a collaborative gas law activity. Chemistry Education Research and Practice, 22(2), 328-346. https://doi.org/10.1039/D0RP00290A
  34. Hynd, C., & Alvermann, D. E. (1986). The role of refutation text in overcoming difficulty with science concepts. Journal of Reading, 29(5), 440-446. https://www.jstor.org/stable/40025804
  35. Kuhn, T. S. (1996). The structure of scientific revolutions (3rd ed.). University of Chicago Press.
  36. Labadze, L., Grigolia, M., & Machaidze, L. (2023). Role of AI chatbots in education: Systematic literature review. International Journal of Educational Technology in Higher Education, 20(1), 56. https://doi.org/10.1186/s41239-023-00426-1
  37. Leite, L. (1999). Heat and temperature: An analysis of how these concepts are dealt with in textbooks. European Journal of Teacher Education, 22(1), 75-88. https://doi.org/10.1080/0261976990220106
  38. Linder, C. J. (1993). A challenge to conceptual change. Science Education, 77(3), 293-300. https://doi.org/10.1002/sce.3730770304
  39. Liu, P., Yuan, W., Fu, J., Jiang, Z., Hayashi, H., & Neubig, G. (2023). Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys, 55(9), 1-35. https://doi.org/10.1145/3560815
  40. Mortimer, E. F. (1995). Conceptual change or conceptual profile change?. Science & Education, (4), 267-285. https://doi.org/10.1007/BF00486624
  41. Odden, T. O. B., & Russ, R. S. (2019). Defining sensemaking: Bringing clarity to a fragmented theoretical construct. Science Education, 103(1), 187-205. https://doi.org/10.1002/sce.21452
  42. O'Dea, X. C., & O'Dea, M. (2023). Is artificial intelligence really the next big thing in learning and teaching in higher education? A conceptual paper. Journal of University Teaching and Learning Practice, 20(5), 4. https://doi.org/10.53761/1.20.5.05
  43. OpenAI. (2023). ChatGPT [A language model developed by OpenAI]. https://openai.com/chatgpt
  44. Posner, G. J., Strike, K. A., Hewson, P. W., & Gertzog, W. A. (1982). Accommodation of a scientific conception: Toward a theory of conceptual change. Science Education, 66(2), 211-227. https://doi.org/10.1002/sce.3730660207
  45. Ramos, C., Augusto, J. C., & Shapiro, D. (2008). Ambient intelligence - the next step for artificial intelligence. IEEE Intelligent Systems, 23(2), 15-18. https://doi.org/10.1109/mis.2008.19
  46. Reynolds, L., & McDonell, K. (2021). Prompt programming for large language models: Beyond the few-shot paradigm. arXiv. https://doi.org/10.48550/arXiv.2102.07350
  47. Rudolph, J., Tan, S., & Tan, S. (2023). War of the chatbots: Bard, Bing Chat, ChatGPT, Ernie and beyond. The new AI gold rush and its impact on higher education. Journal of Applied Learning and Teaching, 6(1), 364-389. https://doi.org/0.37074/jalt.2023.6.1.23
  48. Sartika, F., Ritonga, M., Lahmi, A., Rasyid, A., & Febriani, S. R. (2021). Online learning in the low Internet area, planning, strategies and problems faced by students during the Covid-19 period. In D. Oliva, S. A. Hassan, & A. Mohamed (Eds.), Artificial intelligence for COVID-19 (pp. 413-421). Springer. https://doi.org/10.1007/978-3-030-69744-0_23
  49. Samarapungavan, A., & Wiers, R. W. (1997). Children's thoughts on the origin of species: A study of explanatory coherence. Cognitive Science, 21(2), 147-177. https://doi.org/https://doi.org/10.1016/S0364-0213(99)80021-4
  50. Sirnoorkar, A., Zollman, D., Laverty, J. T., Magana, A. J., Rebello, S., & Bryan, L. A. (2024). Student and AI responses to physics problems examined through the lenses of sensemaking and mechanistic reasoning. arXiv. https://doi.org/10.48550/arXiv.2401.00627
  51. Taber, K. S. (2003). Understanding ionisation energy: Physical, chemical and alternative conceptions. Chemistry Education Research and Practice, 4(2), 155-156. https://doi.org/10.1039/B3RP90010J
  52. Talanquer, V. (2023). Interview with the chatbot: How does it reason?. Journal of Chemical Education, 100(8), 2821-2824. https://doi.org/10.1021/acs.jchemed.3c00472
  53. Tiberghien, A. (1994). Modeling as a basis for analyzing teaching-learning situations. Learning and instruction, 4(1), 71-87. https://doi.org/10.1016/0959-4752(94)90019-1
  54. Ueno, N. (1993). Reconsidering p-prims theory from the viewpoint of situated cognition. Cognition and Instruction, 10(2-3), 239-248. https://doi.org/10.1080/07370008.1985.9649010
  55. Vosniadou, S. (1992). Knowledge acquisition and conceptual change. Applied Psychology, 41(4), 347-357. https://doi.org/10.1111/j.1464-0597.1992.tb00711.x
  56. Wang, T., & Andre, T. (1991). Conceptual change text versus traditional text and application questions versus no questions in learning about electricity. Contemporary Educational Psychology, 16(2), 103-116. https://doi.org/10.1016/0361-476X(91)90031-F
  57. Whalley, B., France, D., Park, J., Mauchline, A., & Welsh, K. (2021). Towards flexible personalized learning and the future educational system in the fourth industrial revolution in the wake of Covid-19. Higher Education Pedagogies, 6(1), 79-99. https://doi.org/10.1080/23752696.2021.1883458
  58. Zawacki-Richter, O., Marín, V. I., Bond, M., & Gouverneur, F. (2019). Systematic review of research on artificial intelligence applications in higher education-where are the educators?. International Journal of Educational Technology in Higher Education, 16(1), 1-27. https://doi.org/10.1186/s41239-019-0171-0

How to cite

Kırbulut Güneş, Z. D., & Güneş, B. (2025). Examining chatbot-generated responses on heat and temperature: misconceptions, consistency, and conceptual change. Education and Science, 51(225), 227-264. https://doi.org/10.15390/ES.2026.2515