How Does Incorporating the Response Times into Mixture Modelling Influence the Identification of Latent Classes for Mathematics Literacy Framework in PISA 2022?

Halime Yıldırım Hoş; Menekşe Uysal Saraç

doi:10.15390/EB.2025.14125

Abstract

Based on PISA 2022 mathematics literacy test data for Türkiye, this study employed a mixture item response model to identify the ability-and non-ability latent classes of students. In line with the mixture item response theory modelling approach proposed by Jeon and De Boeck (2019), the relations between response times and item difficulty and success probabilities were examined by using four different models in a hierarchical comparison. The first of these models was a single-class two-parameter item response theory (2PL IRT) model (Model I), and the second one (Model II) was a two-class model called the ability class and the guessing class with a success probability fixed at 0.25. In the other two-class model (Model III), the success probability of the guessing class was freely estimated. The final model was a two-class model (Model IV) that included the ability class and the non-ability class, i.e. the one with the response time information as a covariate, in line with the approach proposed by Jeon and De Boeck (2019). As a result of the analysis, Model IV (a two-class model in which response time was included as a covariate) was found to be the best fitting model. Whereas the average item response times and success probabilities tended to be low in the non-ability class, these values were higher in the ability class. However, the ability class, which utilized time more effectively (with higher probability of success), was successful by responding rapidly to easy items while spending more time on difficult ones. As opposed to that, the overall low performance of the non-ability class was noteworthy since it turned out that their faster responses on easy items resulted in failure, whereas they were partially successful by dedicating more time to difficult ones. The latter group seems to have adopted a more superficial approach in which they used a type of item response strategy so that they could respond faster than the ability class on all items but tended to be careful by spending more time on difficult items.

Keywords: Item response time, Knowledge retrieval strategy, Rapid guessing, Mixture item response theory, PISA

References

Abramson, L. Y., Metalsky, G. I., & Alloy, L. B. (1989). Hopelessness depression: a theory-based subtype of depression. Psychological Review 96(2), 358-372. doi:10.1037/ 0033-295X.96.2.358
AERA, APA, & NCME. (2014). Standards for educational and psychological testing. American Educational Research Association.
Anghel, E., Khorramdel, L., & von Davier, M. (2024). The use of process data in large-scale assessments: A literature review. Large-scale Assessments in Education, 12(1), 13. doi:10.1186/s40536-024-00202-1.
Baumert, J., & Demmrich, A. (2001). Test motivation in the assessment of student skills: the effects of incentives on motivation and performance. European Journal of Psychology of Education, 16(3), 441-462.
Bloom, B., Engelhart, M., Furst, E., Hill, W., & Krathwohl, D. (1956). Taxonomy of educational objectives: The classification of educational goals handbook I: Cognitive domain. New York: David McKay Company.
Bolt, D. M., Cohen, A. S., & Wollack, J. A. (2002). Item parameter estimation under conditions of test speededness: Application of a mixture Rasch model with ordinal constraints. Journal of Educational Measurement, 39(4), 331-348. doi:10.1111/j.1745-3984.2002.tb01146.x
Brophy, J., & Ames, C. (2005). NAEP testing for twelfth graders: Motivational issues. Washington, DC: National Assessment Governing Board.
Brückner, S., & Pellegrino, J. W. (2017). Contributions of response processes analysis to the validation of an assessment of higher education students’ competence in business and economics. In B. D. Zumbo & A. M. Hubley (Eds.), Understanding and investigating response processes in validation research (pp. 31-35). New York: Springer International Publishing.
Cao, Y., & Stokes, L. (2008). Modeling response times in test-taking: Applications and developments. Journal of Educational Measurement, 45(2), 135-153.
Chang, Y. C., Tsai, C. C., & Hsu, H. C. (2014). The impact of guessing strategies on item response theory model parameters. Educational and Psychological Measurement, 74(1), 69-85.
De Boeck P., & Jeon M (2019) An overview of models for response times and processes in cognitive tests. Frontiers in Psychology, 10, 102. doi:10.3389/fpsyg.2019.00102
De Jong, T., & Ferguson-Hessler, M. (1996). Types and qualities of knowledge. Educational Psychologist, 31(2), 105-113.
Eklöf, H. (2010). Skill and will: Test-taking motivation and assessment quality. Assessment in Education Principles Policy Practice, 17(4), 345-356. doi:10.1080/0969594X.2010.516569
Entwistle, N., & Peterson, E. (2004). Conceptions of learning and knowledge in higher education: Relationships with study behavior and inferences of learning environments. International Journal of Educational Research, 41, 407-428.
Erwin, T. D., & Steven L. W. (2002). A scholar-practitioner model for assessment. In Building a scholarship of assessment (pp. 67-81). San Francisco: Jossey-Bass.
Finn, B. (2015). Measuring motivation in low-stakes assessments (Research report RR-15-19). Princeton, NJ: Educational Testing Service.
Haladyna, T. M., & Downing, S. M. (2004). Construct-irrelevant variance in high-stakes testing. Educational Measurement:Issues and Practice, 23(1), 17-27. doi:10.1111/j.1745-3992.2004.tb00149.x
ITC. (2013). International guidelines for test use. Retrieved from http://www.intestcom.org/Guidelines/Test+Use.php
Jeon, M., & De Boeck, P. (2019). An analysis of an item-response strategy based on knowledge retrieval. Behavior Research Methods, 51, 697-719. doi:10.3758/s13428-018-1064-1
Lo, Y., Mendell, N. R., & Rubin, D. B. (2001). Testing the number of components in a normal mixture. Biometrika, 88(3), 767-778. doi:10.1093/biomet/88.3.767
Maddox, B. (2023). The uses of process data in large-scale educational assessments. OECD Education Working Papers, 286, Paris: OECD Publishing. doi:10.1787/5d9009ff-en
Meyer, J. (2010). A mixture Rasch model with item response time components. Applied Psychological Measurement, 34(7), 521-538.
Ministry of National Education. (2022). PISA 2022 international student assessment program. Retrieved from https://pisa.meb.gov.tr/meb_iys_dosyalar/2022_01/26105818_PISA_2022_TanYtYm_KitapcYYY.pdf
Mislevy, R. J., & Verhelst, N. (1990). Modeling item responses when different subjects employ different solution strategies. Psychometrika, 55(2), 195-215.
Oranje, A., Gorin, J., Jia, Y., Kerr, D., Ercikan, K., & Pellegrino, J. W. (2017). Collecting, analysing, and interpreting response time, eye tracking and log data. In K. Erickan & J. W. Pellegrino (Eds.), Validation of score meaning for the next generation of assessments (pp. 39-51). Mount Royal, NJ: National Council on Measurement in Education.
Pohl, S., Ulitzsch, E., & von Davier, M. (2021). Reframing rankings in educational assessments. Science, 372(6540), 338-340.
Pokropek, A. (2016). Grade of membership response time model for detecting guessing behaviors. Journal of Educational and Behavioral Statistics, 41(3), 300-325. doi:10.3102/1076998616636618.
Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6(2), 461-464.
Seligman, M. E. (1972). Learned helplessness. Annual Review of Medicine, 23, 407-412. doi:10.1146/annurev.me.23.020172.002203
Shiffrin, R. M., & Schneider, W. (1977). Controlled and automatic human information processing: II. Perceptual learning, automatic attending and a general theory. Psychological Review, 84(2), 127-190.
Sideridis, G., & Alahmadi, M. T. S. (2022). The role of response times on the measurement of mental ability. Frontiers in Psychology, 13. doi:10.3389/fpsyg.2022.892317
Sideridis, G., Tsaousis, I., & Al-Harbi, K. (2022). Identifying ability and nonability groups: incorporating response times using mixture modeling. Educational and Psychological Measurement, 82(6), 1087-1106.
Wang, C. G., & Xu, G. (2015). A mixture hierarchical model for response times and response accuracy. British Journal of Mathematical and Statistical Psychology, 68(3), 456-477.
Wise, S.L. (2019). An information-based approach to identifying rapid-guessing thresholds. Applied Measurement in Education, 32(4), 325-336, doi:10.1080/08957347.2019.1660350
Wise, S. L., & Demars, C. E. (2006). An application of item response time: The effort moderated IRT model. Journal of Educational Measurement, 43(1), 19-38.
Wise, S. L., & Kong, X. (2005). Response time effort: A new measure of examinee motivation in computer-based tests. Applied Measurement in Education, 18, 163-183. doi:10.1207/s15324818ame1802_2
Wise, S. L., Pastor, D. A., & Kong, X. J. (2009). Correlates of rapid-guessing behavior in low-stakes testing: Implications for test development and measurement practice. Applied Measurement in Education, 22(2), 185-205.
Yamamoto, K. H. (1997). Modeling the effects of test length and test time on parameter estimation using the HYBRID model. In J. Rost & R. Langeheine (Eds.), Applications of latent trait and latent class models in the social sciences (pp. 89-98). New York: Waxmann Verlag GmbH.

Copyright and license

Copyright © 2025 The Author(s). This is an open access article distributed under the Creative Commons Attribution License (CC BY), which permits unrestricted use, distribution, and reproduction in any medium or format, provided the original work is properly cited.

How to cite

Yıldırım Hoş, H., & Uysal Saraç, M. (2025). How Does Incorporating the Response Times into Mixture Modelling Influence the Identification of Latent Classes for Mathematics Literacy Framework in PISA 2022?. Education and Science, 50, 129-146. https://doi.org/10.15390/EB.2025.14125

Download citation

[ref1] Abramson, L. Y., Metalsky, G. I., & Alloy, L. B. (1989). Hopelessness depression: a theory-based subtype of depression. Psychological Review 96(2), 358-372. doi:10.1037/ 0033-295X.96.2.358

[ref2] AERA, APA, & NCME. (2014). Standards for educational and psychological testing. American Educational Research Association.

[ref3] Anghel, E., Khorramdel, L., & von Davier, M. (2024). The use of process data in large-scale assessments: A literature review. Large-scale Assessments in Education, 12(1), 13. doi:10.1186/s40536-024-00202-1.

[ref4] Baumert, J., & Demmrich, A. (2001). Test motivation in the assessment of student skills: the effects of incentives on motivation and performance. European Journal of Psychology of Education, 16(3), 441-462.

[ref5] Bloom, B., Engelhart, M., Furst, E., Hill, W., & Krathwohl, D. (1956). Taxonomy of educational objectives: The classification of educational goals handbook I: Cognitive domain. New York: David McKay Company.

[ref6] Bolt, D. M., Cohen, A. S., & Wollack, J. A. (2002). Item parameter estimation under conditions of test speededness: Application of a mixture Rasch model with ordinal constraints. Journal of Educational Measurement, 39(4), 331-348. doi:10.1111/j.1745-3984.2002.tb01146.x

[ref7] Brophy, J., & Ames, C. (2005). NAEP testing for twelfth graders: Motivational issues. Washington, DC: National Assessment Governing Board.

[ref8] Brückner, S., & Pellegrino, J. W. (2017). Contributions of response processes analysis to the validation of an assessment of higher education students’ competence in business and economics. In B. D. Zumbo & A. M. Hubley (Eds.), Understanding and investigating response processes in validation research (pp. 31-35). New York: Springer International Publishing.

[ref9] Cao, Y., & Stokes, L. (2008). Modeling response times in test-taking: Applications and developments. Journal of Educational Measurement, 45(2), 135-153.

[ref10] Chang, Y. C., Tsai, C. C., & Hsu, H. C. (2014). The impact of guessing strategies on item response theory model parameters. Educational and Psychological Measurement, 74(1), 69-85.

[ref11] De Boeck P., & Jeon M (2019) An overview of models for response times and processes in cognitive tests. Frontiers in Psychology, 10, 102. doi:10.3389/fpsyg.2019.00102

[ref12] De Jong, T., & Ferguson-Hessler, M. (1996). Types and qualities of knowledge. Educational Psychologist, 31(2), 105-113.

[ref13] Eklöf, H. (2010). Skill and will: Test-taking motivation and assessment quality. Assessment in Education Principles Policy Practice, 17(4), 345-356. doi:10.1080/0969594X.2010.516569

[ref14] Entwistle, N., & Peterson, E. (2004). Conceptions of learning and knowledge in higher education: Relationships with study behavior and inferences of learning environments. International Journal of Educational Research, 41, 407-428.

[ref15] Erwin, T. D., & Steven L. W. (2002). A scholar-practitioner model for assessment. In Building a scholarship of assessment (pp. 67-81). San Francisco: Jossey-Bass.

[ref16] Finn, B. (2015). Measuring motivation in low-stakes assessments (Research report RR-15-19). Princeton, NJ: Educational Testing Service.

[ref17] Haladyna, T. M., & Downing, S. M. (2004). Construct-irrelevant variance in high-stakes testing. Educational Measurement:Issues and Practice, 23(1), 17-27. doi:10.1111/j.1745-3992.2004.tb00149.x

[ref18] ITC. (2013). International guidelines for test use. Retrieved from http://www.intestcom.org/Guidelines/Test+Use.php

[ref19] Jeon, M., & De Boeck, P. (2019). An analysis of an item-response strategy based on knowledge retrieval. Behavior Research Methods, 51, 697-719. doi:10.3758/s13428-018-1064-1

[ref20] Lo, Y., Mendell, N. R., & Rubin, D. B. (2001). Testing the number of components in a normal mixture. Biometrika, 88(3), 767-778. doi:10.1093/biomet/88.3.767

[ref21] Maddox, B. (2023). The uses of process data in large-scale educational assessments. OECD Education Working Papers, 286, Paris: OECD Publishing. doi:10.1787/5d9009ff-en

[ref22] Meyer, J. (2010). A mixture Rasch model with item response time components. Applied Psychological Measurement, 34(7), 521-538.

[ref23] Ministry of National Education. (2022). PISA 2022 international student assessment program. Retrieved from https://pisa.meb.gov.tr/meb_iys_dosyalar/2022_01/26105818_PISA_2022_TanYtYm_KitapcYYY.pdf

[ref24] Mislevy, R. J., & Verhelst, N. (1990). Modeling item responses when different subjects employ different solution strategies. Psychometrika, 55(2), 195-215.

[ref25] Oranje, A., Gorin, J., Jia, Y., Kerr, D., Ercikan, K., & Pellegrino, J. W. (2017). Collecting, analysing, and interpreting response time, eye tracking and log data. In K. Erickan & J. W. Pellegrino (Eds.), Validation of score meaning for the next generation of assessments (pp. 39-51). Mount Royal, NJ: National Council on Measurement in Education.

[ref26] Pohl, S., Ulitzsch, E., & von Davier, M. (2021). Reframing rankings in educational assessments. Science, 372(6540), 338-340.

[ref27] Pokropek, A. (2016). Grade of membership response time model for detecting guessing behaviors. Journal of Educational and Behavioral Statistics, 41(3), 300-325. doi:10.3102/1076998616636618.

[ref28] Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6(2), 461-464.

[ref29] Seligman, M. E. (1972). Learned helplessness. Annual Review of Medicine, 23, 407-412. doi:10.1146/annurev.me.23.020172.002203

[ref30] Shiffrin, R. M., & Schneider, W. (1977). Controlled and automatic human information processing: II. Perceptual learning, automatic attending and a general theory. Psychological Review, 84(2), 127-190.

[ref31] Sideridis, G., & Alahmadi, M. T. S. (2022). The role of response times on the measurement of mental ability. Frontiers in Psychology, 13. doi:10.3389/fpsyg.2022.892317

[ref32] Sideridis, G., Tsaousis, I., & Al-Harbi, K. (2022). Identifying ability and nonability groups: incorporating response times using mixture modeling. Educational and Psychological Measurement, 82(6), 1087-1106.

[ref33] Wang, C. G., & Xu, G. (2015). A mixture hierarchical model for response times and response accuracy. British Journal of Mathematical and Statistical Psychology, 68(3), 456-477.

[ref34] Wise, S.L. (2019). An information-based approach to identifying rapid-guessing thresholds. Applied Measurement in Education, 32(4), 325-336, doi:10.1080/08957347.2019.1660350

[ref35] Wise, S. L., & Demars, C. E. (2006). An application of item response time: The effort moderated IRT model. Journal of Educational Measurement, 43(1), 19-38.

[ref36] Wise, S. L., & Kong, X. (2005). Response time effort: A new measure of examinee motivation in computer-based tests. Applied Measurement in Education, 18, 163-183. doi:10.1207/s15324818ame1802_2

[ref37] Wise, S. L., Pastor, D. A., & Kong, X. J. (2009). Correlates of rapid-guessing behavior in low-stakes testing: Implications for test development and measurement practice. Applied Measurement in Education, 22(2), 185-205.

[ref38] Yamamoto, K. H. (1997). Modeling the effects of test length and test time on parameter estimation using the HYBRID model. In J. Rost & R. Langeheine (Eds.), Applications of latent trait and latent class models in the social sciences (pp. 89-98). New York: Waxmann Verlag GmbH.

Education and Science

How Does Incorporating the Response Times into Mixture Modelling Influence the Identification of Latent Classes for Mathematics Literacy Framework in PISA 2022?

Authors

Abstract

References

Copyright and license

How to cite