Abstract
This study examined whether responses generated by chatbots (ChatGPT-3.5, ChatGPT-4, and Bard) about heat and temperature match misconceptions identified in the literature and how these responses compare to those of learners. The study also addressed the effect of Conceptual Change Texts (CCTs) on chatbot-generated responses about heat and temperature, focusing on their relevance to prompt engineering. Heat and Temperature Four-tier Misconception Test (HTMCT) and CCTs were utilized from a previous study that investigated the effectiveness of CCTs in remedying misconceptions about heat and temperature held by pre-service physics teachers. The HTMCT, consisting of 20 items, was designed to diagnose misconceptions about heat and temperature held by pre-service physics teachers as identified in the literature, with each misconception being assessed using multiple items. In this study, the HTMCT was used to diagnose the chatbots’ responses of the heat and temperature concepts before and after the implementation of CCTs. In addition, in-depth interviews with the chatbots were conducted to elaborate on their responses. Pre-service physics teachers in the prior study exhibited misconceptions about heat and temperature, which were effectively remediated by CCTs, leading to significant overall improvements. Similarly, this study found that chatbot-generated responses, except those from Bard, were prone to misconceptions. ChatGPT-4 consistently generated responses that aligned with the scientific paradigm, unlike the other two chatbots. However, pre- and post-test data revealed that ChatGPT-4-generated responses were prone to a misconception, specifically that equal amounts of heat supplied to different substances will result in the same final temperature, and these responses consistently reflected this misconception. Both ChatGPT-3.5 and Bard showed improved performance between the pre- and post-test data, despite providing inconsistent responses. While chatbots could generate responses that accurately expressed concept definitions, they struggled with drawing conclusions based on multiple scientific concepts, applying concepts to real-world scenarios, and engaging in complex reasoning. In this study, while the algorithms underlying the chatbots remain undisclosed, the post-test responses for all chatbots showed a notable decrease in incorrect responses and improved alignment with scientific knowledge, suggesting a positive influence of CCTs, akin to findings from the prior study.
Keywords: Science education, Artificial intelligence chatbots, Prompt engineering, Heat and temperature, Misconception, Conceptual change approach, Four-tier test
References
- Alasadi, E. A., & Baiz, C. R. (2023). Generative AI in education and research: Opportunities, concerns, and solutions. Journal of Chemical Education, 100(8), 2965-2971. https://doi.org/10.1021/acs.jchemed.3c00323
- Allchin, D. (2023). Ten competencies for the science misinformation crisis. Science Education, 107(2), 261-274. https://doi.org/10.1002/sce.21746
- Ausubel, D. P. (1968). Educational psychology: A cognitive view. Holt, Rinehart and Winston.
- Aydoğan, S. Güneş, B., & Gülçiçek, C. (2003). Isı ve sıcaklık konusunda kavram yanılgıları. Gazi Üniversitesi Gazi Eğitim Fakültesi Dergisi, 23(2), 111-124. https://dergipark.org.tr/en/pub/gefad/issue/6762/90969
- Baker, T., & Smith, L. (2023). Educ-AI-tion rebooted? Exploring the future of artificial intelligence in schools and colleges. Nesta. https://media.nesta.org.uk/documents/Future_of_AI_and_education_v5_WEB.pdf
- Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., ... Amodei, D. (2020). Language models are few-shot learners. arXiv. https://doi.org/10.48550/arXiv.2005.14165
- Caramancion, K. M. (2023). News verifiers showdown: A comparative performance evaluation of ChatGPT 3.5, ChatGPT 4.0, Bing AI, and Bard in news fact-checking. arXiv. https://doi.org/10.48550/arxiv.2306.17176
- Carey, S. (1999). Sources of conceptual change. In E. Scholnick, K. Nelson, S. Gelman, & P. Miller (Eds.), Conceptual development: Piaget’s legacy (pp. 293-326). Erlbaum.
- Chambers, S. K., & Andre, T. (1997). Gender, prior knowledge, interest, and experience in electricity and conceptual change text manipulations in learning about direct current. Journal of Research in Science Teaching, 34(2), 107-123. https://doi.org/10.1002/(SICI)1098-2736(199702)34:2<107::AID-TEA2>3.0.CO;2-X
- Champagne, A. B., Klopfer, L. E., & Gunstone, R. F. (1982). Cognitive research and the design of science instruction. Educational Psychology, 17(1), 31-53. https://doi.org/10.1080/00461528209529242
- Chen, Y., Jensen, S., Albert, L. J., Gupta, S., & Lee, T. (2023). Artificial intelligence (AI) student assistants in the classroom: Designing chatbots to support student success. Information Systems Frontiers, 25(1), 161-182. https://doi.org/10.1007/s10796-022-10291-4
- Chi, M. T. H. (1992). Conceptual change in and across ontological categories: Examples from learning and discovery in science. In R. Giere (Ed.), Cognitive models of science (pp. 129-177). University of Minnesota Press.
- Clark, T. M. (2023). Investigating the use of an artificial intelligence chatbot with general chemistry exam questions. Journal of Chemical Education, 100(5), 1905-1916. https://doi.org/10.1021/acs.jchemed.3c00027
- Clough, E. E., & Driver, R. (1986). A study of consistency in the use of students' conceptual frameworks across different task contexts. Science Education, 70(4), 473-96. https://doi.org/10.1002/sce.3730700412
- Çelik, A. K. (2022). Isı ve sıcaklık konusundaki kavram yanılgılarının iyileştirilmesinde kavramsal değişim metinlerinin etkisi. (Thesis No. 771280) [Master’s thesis, Gazi University]. Council of Higher Education National Thesis Center.
- diSessa, A. A. (1988). Knowledge in pieces. In G. Forman & P. Pufall (Eds.), Constructivism in the computer age (pp. 49-70). Lawrence Erlbaum Associates.
- diSessa, A. A. (2008). A bird’s-eye view of the “pieces” vs. “coherence” controversy (from the “pieces” side of the fence). In S. Vosniadou (Ed.), International handbook of research on conceptual change (pp. 35-60). Routledge.
- diSessa, A. A., Gillespie, N., & Esterly, J. (2004). Coherence versus fragmentation in the development of the concept of force. Cognitive Science, 28(6), 843-900. https://doi.org/10.1207/s15516709cog2806_1
- Driver, R., & Easley, J. (1978). Pupils and paradigms: A review of literature related to concept development in adolescent science students. Studies in Science Education, 5(1), 61-84. https://doi.org/10.1080/03057267808559857
- Driver, R., Squires, A., Rushworth, P., & Wood-Robinson, V. (1994). Making sense of secondary science: Research into children's ideas. Routledge. https://doi.org/10.4324/9780203823583
- Duit, R., & Treagust, D. F. (2003). Conceptual change: A powerful framework for improving science teaching and learning. International Journal of Science Education, 25(6), 671-688. https://doi.org/10.1080/09500690305016
- Ekin, S. (2023). Prompt engineering for ChatGPT: A quick guide to techniques, tips, and best practices. TechRxiv. https://doi.org/10.36227/techrxiv.22683919
- Exintaris, B., Karunaratne, N., & Yuriev, E. (2023). Metacognition and critical thinking: Using ChatGPT-generated responses as prompts for critique in a problem-solving workshop (SMARTCHEMPer). Journal of Chemical Education, 100(8), 2972-2980. https://doi.org/10.1021/acs.jchemed.3c00481
- Fergus, S., Botha, M., & Ostovar, M. (2023). Evaluating academic answers generated using ChatGPT. Journal of Chemical Education, 100(4), 1672-1675. https://doi.org/10.1021/acs.jchemed.3c00087
- Ford, M. J. (2012). A dialogic account of sense-making in scientific argumentation and reasoning. Cognition and Instruction, 30(3), 207-245. https://doi.org/10.1080/07370008.2012.689383
- Gao, T., Fisch, A., & Chen, D. (2020). Making pre-trained language models better few-shot learners. arXiv. https://doi.org/10.48550/arXiv.2012.15723
- Güneş, B. (2021). Fizikte kavram yanılgıları (2nd ed.). Palme.
- Güneş, F. (2020). Isı ve sıcaklık ile ilgili kavram yanılgılarını belirlemeye yönelik dört aşamalı bir testin geliştirilerek uygulanması (Thesis No. 624674) [Master’s thesis, Gazi University]. Council of Higher Education National Thesis Center.
- Hammer, D. (1996). Misconceptions or p-prims: How may alternative perspectives of cognitive structure influence instructional perceptions and intentions. The Journal of the Learning Sciences, 5(2), 97-127. https://doi.org/10.1207/s15327809jls0502_1
- Hestenes, D., & Halloun, I. (1995). Interpreting the force concept inventory. The Physics Teacher, 33(8), 504-506. https://doi.org/10.1119/1.2344278
- Hestenes, D., Wells, M., & Swackhamer, G. (1992). Force concept inventory. The Physics Teacher, 30(3), 141-158. https://doi.org/10.1119/1.2343497
- Humphry, T., & Fuller, A. L. (2023). Potential ChatGPT use in undergraduate chemistry laboratories. Journal of Chemical Education, 100(4), 1434-1436. https://doi.org/10.1021/acs.jchemed.3c00006
- Hunter, K. H., Rodriguez, J. M. G., & Becker, N. M. (2021). Making sense of sensemaking: Using the sensemaking epistemic game to investigate student discourse during a collaborative gas law activity. Chemistry Education Research and Practice, 22(2), 328-346. https://doi.org/10.1039/D0RP00290A
- Hynd, C., & Alvermann, D. E. (1986). The role of refutation text in overcoming difficulty with science concepts. Journal of Reading, 29(5), 440-446. https://www.jstor.org/stable/40025804
- Kuhn, T. S. (1996). The structure of scientific revolutions (3rd ed.). University of Chicago Press.
- Labadze, L., Grigolia, M., & Machaidze, L. (2023). Role of AI chatbots in education: Systematic literature review. International Journal of Educational Technology in Higher Education, 20(1), 56. https://doi.org/10.1186/s41239-023-00426-1
- Leite, L. (1999). Heat and temperature: An analysis of how these concepts are dealt with in textbooks. European Journal of Teacher Education, 22(1), 75-88. https://doi.org/10.1080/0261976990220106
- Linder, C. J. (1993). A challenge to conceptual change. Science Education, 77(3), 293-300. https://doi.org/10.1002/sce.3730770304
- Liu, P., Yuan, W., Fu, J., Jiang, Z., Hayashi, H., & Neubig, G. (2023). Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys, 55(9), 1-35. https://doi.org/10.1145/3560815
- Mortimer, E. F. (1995). Conceptual change or conceptual profile change?. Science & Education, (4), 267-285. https://doi.org/10.1007/BF00486624
- Odden, T. O. B., & Russ, R. S. (2019). Defining sensemaking: Bringing clarity to a fragmented theoretical construct. Science Education, 103(1), 187-205. https://doi.org/10.1002/sce.21452
- O'Dea, X. C., & O'Dea, M. (2023). Is artificial intelligence really the next big thing in learning and teaching in higher education? A conceptual paper. Journal of University Teaching and Learning Practice, 20(5), 4. https://doi.org/10.53761/1.20.5.05
- OpenAI. (2023). ChatGPT [A language model developed by OpenAI]. https://openai.com/chatgpt
- Posner, G. J., Strike, K. A., Hewson, P. W., & Gertzog, W. A. (1982). Accommodation of a scientific conception: Toward a theory of conceptual change. Science Education, 66(2), 211-227. https://doi.org/10.1002/sce.3730660207
- Ramos, C., Augusto, J. C., & Shapiro, D. (2008). Ambient intelligence - the next step for artificial intelligence. IEEE Intelligent Systems, 23(2), 15-18. https://doi.org/10.1109/mis.2008.19
- Reynolds, L., & McDonell, K. (2021). Prompt programming for large language models: Beyond the few-shot paradigm. arXiv. https://doi.org/10.48550/arXiv.2102.07350
- Rudolph, J., Tan, S., & Tan, S. (2023). War of the chatbots: Bard, Bing Chat, ChatGPT, Ernie and beyond. The new AI gold rush and its impact on higher education. Journal of Applied Learning and Teaching, 6(1), 364-389. https://doi.org/0.37074/jalt.2023.6.1.23
- Sartika, F., Ritonga, M., Lahmi, A., Rasyid, A., & Febriani, S. R. (2021). Online learning in the low Internet area, planning, strategies and problems faced by students during the Covid-19 period. In D. Oliva, S. A. Hassan, & A. Mohamed (Eds.), Artificial intelligence for COVID-19 (pp. 413-421). Springer. https://doi.org/10.1007/978-3-030-69744-0_23
- Samarapungavan, A., & Wiers, R. W. (1997). Children's thoughts on the origin of species: A study of explanatory coherence. Cognitive Science, 21(2), 147-177. https://doi.org/https://doi.org/10.1016/S0364-0213(99)80021-4
- Sirnoorkar, A., Zollman, D., Laverty, J. T., Magana, A. J., Rebello, S., & Bryan, L. A. (2024). Student and AI responses to physics problems examined through the lenses of sensemaking and mechanistic reasoning. arXiv. https://doi.org/10.48550/arXiv.2401.00627
- Taber, K. S. (2003). Understanding ionisation energy: Physical, chemical and alternative conceptions. Chemistry Education Research and Practice, 4(2), 155-156. https://doi.org/10.1039/B3RP90010J
- Talanquer, V. (2023). Interview with the chatbot: How does it reason?. Journal of Chemical Education, 100(8), 2821-2824. https://doi.org/10.1021/acs.jchemed.3c00472
- Tiberghien, A. (1994). Modeling as a basis for analyzing teaching-learning situations. Learning and instruction, 4(1), 71-87. https://doi.org/10.1016/0959-4752(94)90019-1
- Ueno, N. (1993). Reconsidering p-prims theory from the viewpoint of situated cognition. Cognition and Instruction, 10(2-3), 239-248. https://doi.org/10.1080/07370008.1985.9649010
- Vosniadou, S. (1992). Knowledge acquisition and conceptual change. Applied Psychology, 41(4), 347-357. https://doi.org/10.1111/j.1464-0597.1992.tb00711.x
- Wang, T., & Andre, T. (1991). Conceptual change text versus traditional text and application questions versus no questions in learning about electricity. Contemporary Educational Psychology, 16(2), 103-116. https://doi.org/10.1016/0361-476X(91)90031-F
- Whalley, B., France, D., Park, J., Mauchline, A., & Welsh, K. (2021). Towards flexible personalized learning and the future educational system in the fourth industrial revolution in the wake of Covid-19. Higher Education Pedagogies, 6(1), 79-99. https://doi.org/10.1080/23752696.2021.1883458
- Zawacki-Richter, O., Marín, V. I., Bond, M., & Gouverneur, F. (2019). Systematic review of research on artificial intelligence applications in higher education-where are the educators?. International Journal of Educational Technology in Higher Education, 16(1), 1-27. https://doi.org/10.1186/s41239-019-0171-0
Copyright and license
Copyright © 2026 The Author(s). This is an open access article distributed under the Creative Commons Attribution License (CC BY), which permits unrestricted use, distribution, and reproduction in any medium or format, provided the original work is properly cited.
