Research Article

Enhancing university level English proficiency with generative AI: Empirical insights into automated feedback and learning outcomes

Sumie Tsz Sum Chan 1 , Noble Po Kan Lo 2 3 * , Alan Man Him Wong 1
More Detail
1 English Language Teaching Unit, The Chinese University of Hong Kong, Hong Kong, CHINA2 Department of Educational Research, Lancaster University, Lancaster, UNITED KINGDOM3 Division of Languages and Communication, College of Professional and Continuing Education, The Hong Kong Polytechnic University, Hong Kong, CHINA* Corresponding Author
Contemporary Educational Technology, 16(4), October 2024, ep541, https://doi.org/10.30935/cedtech/15607
Published: 13 November 2024
OPEN ACCESS   2994 Views   480 Downloads
Download Full Text (PDF)

ABSTRACT

This paper investigates the effects of large language model (LLM) based feedback on the essay writing proficiency of university students in Hong Kong. It focuses on exploring the potential improvements that generative artificial intelligence (AI) can bring to student essay revisions, its effect on student engagement with writing tasks, and the emotions students experience while undergoing the process of revising written work. Utilizing a randomized controlled trial, it draws comparisons between the experiences and performance of 918 language students at a Hong Kong university, some of whom received generated feedback (GPT-3.5-turbo LLM) and some of whom did not. The impact of AI-generated feedback is assessed not only through quantifiable metrics, entailing statistical analysis of the impact of AI feedback on essay grading, but also through subjective indices, student surveys that captured motivational levels and emotional states, as well as thematic analysis of interviews with participating students. The incorporation of AI-generated feedback into the revision process demonstrated significant improvements in the caliber of students’ essays. The quantitative data suggests notable effect sizes of statistical significance, while qualitative feedback from students highlights increases in engagement and motivation as well as a mixed emotional experience during revision among those who received AI feedback.

CITATION (APA)

Chan, S. T. S., Lo, N. P. K., & Wong, A. M. H. (2024). Enhancing university level English proficiency with generative AI: Empirical insights into automated feedback and learning outcomes. Contemporary Educational Technology, 16(4), ep541. https://doi.org/10.30935/cedtech/15607

REFERENCES

  1. Al Shloul, T., Mazhar, T., Abbas, Q., Iqbal, M., Ghadi, Y. Y., Shahzad, T., Mallek, F., & Hamam, H. (2024). Role of activity-based learning and ChatGPT on students’ performance in education. Computers and Education: Artificial Intelligence, 6, Article 100219. https://doi.org/10.1016/j.caeai.2024.100219
  2. Al-Khreseh, M. H. (2024). Bridging technology and pedagogy from a global lens: Teachers’ perspectives on integrating ChatGPT in English language teaching. Computers and Education: Artificial Intelligence, 6, Article 100218. https://doi.org/10.1016/j.caeai.2024.100218
  3. Alvero, A. J., Arthurs, N., Antonio, A. L., Domingue, B. W., Gebre-Medhin, B., Giebel, S., & Stevens, M. L. (2020). AI and holistic review: Informing human reading in college admissions. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society (pp. 200–206). ACM. https://doi.org/10.1145/3375627.3375871
  4. Aslan, S., Durham, L. M., Alyuz, N., Okur, E., Sharma, S., Savur, C., & Nachman, L. (2024). Immersive multi-modal pedagogical conversational artificial intelligence for early childhood education: An exploratory case study in the wild. Computers and Education: Artificial Intelligence, 6, Article 100220. https://doi.org/10.1016/j.caeai.2024.100220
  5. Attride-Stirling, J. (2001). Thematic networks: An analytical tool for qualitative research. Commission for Health Improvement, 1(3), 385–405. https://doi.org/10.1177/146879410100100307
  6. Bowman, S. R. (2023). Eight things to know about large language models. arXiv. https://doi.org/10.48550/arXiv.2304.00612
  7. Braun, V., & Clarke, V. (2006). Using thematic analysis in psychology. Qualitative Research in Psychology, 3(2), 77–101. https://doi.org/10.1191/1478088706qp063oa
  8. Bressane, A., Zwirn, D., Essiptchouk, A., Saraiva, A. C. V., de Campos Carvalho, F. L., Formiga, J. K. S., de Castro Medeiros, L. C., & Negri, R. G. (2024). Understanding the role of study strategies and learning disabilities on student academic performance to enhance educational approaches: A proposal using artificial intelligence. Computers and Education: Artificial Intelligence, 6, Article 100196. https://doi.org/10.1016/j.caeai.2023.100196
  9. British Educational Research Association. (2018). Ethical guidelines for educational research. British Educational Research Association.
  10. Chang, Y., Wang, X., Wang, J., Wu, Y., Yang, L., Zhu, K., Chen, H., Wang, C., Wang, Y., Ye, W., Zhang, Y., Zhang, Y., Yu, P. S., Yang, Q., & Xie, X. (2024). A survey on evolution of large language models. ACM Transactions on Intelligent Systems and Technology, 15(3), Article 39. https://doi.org/10.1145/3641289
  11. Chen, L., Chen, P., & Lin, Z. (2020). Artificial intelligence in education: A review. IEEE Access, 8, 75264–75278. https://doi.org/10.1109/ACCESS.2020.2988510
  12. Chen, S.-M., & Bai, S.-M. (2010). Using data mining techniques to automatically construct concept maps for adaptive learning systems. Expert Systems with Applications, 37(6), 4496–4503. https://doi.org/10.1016/j.eswa.2009.12.060
  13. Chia, Y. K., Hong, P., Bing, L., & Pira, S. (2023). Instructeval: Towards holistic evaluation of instruction-tuned large language models. arXiv. https://doi.org/10.48550/arXiv.2306.04757
  14. Crossley, S. A., Baffour, P., Tian, Y., Picou, A., Banner, M., & Boser, U. (2022). The persuasive essays for rating, selecting, and understanding argumentative and discourse element (PERSUADE) corpus 1.0. Assessing Writing, 54, Article 100667. https://doi.org/10.1016/j.asw.2022.100667
  15. Dai, W., Lin, J., Jin, F., Li, T., Tsai, Y.-S., Gasevic, D., & Chen, G. (2023). Can large language models provide feedback to students? A case study on ChatGPT. In Proceedings of the 2023 IEEE International Conference on Advanced Learning Technologies (pp. 323–325). IEEE. https://doi.org/10.1109/ICALT58122.2023.00100
  16. Eccles, J. S., & Wigfield, A. (2020). From expectancy-value theory to situated expectancy-value theory: A developmental, social cognitive, and sociocultural perspective on motivation. Contemporary Educational Psychology, 61, Article 101859. https://doi.org/10.1016/j.cedpsych.2020.101859
  17. Essel, H. B., Vlachopoulos, D., Essuman, A. B., & Amankwa, J. O. (2024). ChatGPT effects on cognitive skills of undergraduate students: Receiving instant responses from AI-based conversational large language models (LLMs). Computers and Education: Artificial Intelligence, 6, Article 100198. https://doi.org/10.1016/j.caeai.2023.100198
  18. Feng, S., & Law, N. (2021). Mapping artificial intelligence in education research: A network-based keyword analysis. International Journal of Artificial Intelligence in Education, 31, 277–303. https://doi.org/10.1007/s40593-021-00244-4
  19. Fleckenstein, J., Liebenow, L. W., & Meyer, J. (2023). Automated feedback and writing: A multi-level meta-analysis of effects on students’ performance. Frontiers in Artificial Intelligence, 6. https://doi.org/10.3389/frai.2023.1162454
  20. Gao, R., Merzdorf, H. E., Anwar, S., Hipwell, M. C., & Srinivasa, A. R. (2024). Automatic assessment of text-based responses in post-secondary education. Computers and Education: Artificial Intelligence, 6, Article 100206. https://doi.org/10.1016/j.caeai.2024.100206
  21. Gnepp, J., Klayman, J., Williamson, I. O., & Barlas, S. (2020). The future of feedback: Motivating performance improvement through future-focused feedback. PLoS ONE, 15(6), Article e0234444. https://doi.org/10.1371/journal.pone.0234444
  22. Graham, S., Hebert, M., & Harris, K. R. (2015). Formative assessment and writing. The Elementary School Journal, 115(4), 523–547. https://doi.org/10.1086/681947
  23. Guthrie, G. (2010). Basic research methods: An entry to social science research. SAGE. https://doi.org/10.4135/9788132105961
  24. Hahn, M. G., Navarro, S. M. B., La Fuente Valentin, I., & Burgos, D. (2021). A systematic review of the effects of automatic scoring and automatic feedback in educational settings. IEEE Access, 9, 108190–108198. https://doi.org/10.1109/ACCESS.2021.3100890
  25. Holmes, A. G. D. (2020). Researcher positionality–A consideration of its influence and place in qualitative research–A new researcher guide. Shanlax International Journal of Education, 8(4), 1–10. https://doi.org/10.34293/education.v8i4.3232
  26. Huang, A. Y. Q., Lu, O. H. T., & Yang, S. J. H. (2023). Effects of artificial intelligence-enabled personalized recommendations on learners’ learning engagement, motivation, and outcomes in a flipped classroom. Computers & Education, 194, Article 104684. https://doi.org/10.1016/j.compedu.2022.104684
  27. Hwang, G.-J., Xie, H., Wah, B. W., & Gašević, D. (2020). Vision, challenges, roles and research issues of artificial intelligence in education. Computers and Education: Artificial Intelligence, 1, Article 100001. https://doi.org/10.1016/j.caeai.2020.100001
  28. Jacobsen, L. J., & Weber, K. E. (2023). The promises and pitfalls of ChatGPT as a feedback provider in higher education: An exploratory study of prompt engineering and the quality of AI-driven feedback. OSF Preprints. https://doi.org/10.31219/osf.io/cr257
  29. Kabudi, T., Pappas, I., & Olsen, D. H. (2021). AI-enabled adaptive learning systems: A systematic mapping of the literature. Computers and Education: Artificial Intelligence, 2, Article 100017. https://doi.org/10.1016/j.caeai.2021.100017
  30. Kasneci, E., Seßler, K., Küchemann, S., Bannert, M., Dementieva, D., Fischer, F., Gasser, U., Groh, G., Günnemann, S., Hüllermeier, E., Krusche, S., Kutyniok, G., Michaeli, T., Nerdel, C., Pfeffer, J., Poquet, O., Sailer, M., Schmidt, A., Seidel, T., …, & Kasneci, G. (2023). ChatGPT for good? On opportunities and challenges of large language models for education. Learning and Individual Differences, 103, Article 102274. https://doi.org/10.1016/j.lindif.2023.102274
  31. Knoth, N., Tolzin, A., Janson, A., & Leimeister, J. M. (2024). AI literacy and its implications for prompt engineering strategies. Computers and Education: Artificial Intelligence, 6, Article 100225. https://doi.org/10.1016/j.caeai.2024.100225
  32. Langley, P. (2019). An integrative framework for artificial intelligence. Proceedings of the AAAI Conference on Artificial Intelligence, 33(1). https://doi.org/10.1609/aaai.v33i01.33019670
  33. Lee, D., Arnold, M., Srivastava, A., Plastow, K., Strwlan, P., Ploeckl, F., Lekkas, D., & Palmer, E. (2024a). The impact of generative AI on higher education learning and teaching: A study of educators’ perspectives. Computers and Education: Artificial Intelligence, 6, Article 100221. https://doi.org/10.1016/j.caeai.2024.100221
  34. Lee, G.-G., Latif, E., Wu, X., Liu, N., & Zhai, X. (2024b). Applying large language models and chain-of-thought for automatic scoring. Computers and Education: Artificial Intelligence, 6, Article 100213. https://doi.org/10.1016/j.caeai.2024.100213
  35. Li, C., & Xing, W. (2021). Natural language generation using deep learning to support MOOC learners. International Journal of Artificial Intelligence in Education, 31, 186–214. https://doi.org/10.1007/s40593-020-00235-x
  36. Lipnevich, A. A., Murano, D., Krannich, M., & Goetz, T. (2021). Should I grade or should I comment: Links among feedback, emotions, and performance. Learning and Individual Differences, 89, Article 102020. https://doi.org/10.1016/j.lindif.2021.102020
  37. Luckin, R. (2017). Towards artificial intelligence-based assessment systems. Nature Human Behaviour, 1, Article 0028. https://doi.org/10.1038/s41562-016-0028
  38. Madigan, D. J., & Kim, L. E. (2021). Does teacher burnout affect students? A systematic review of its association with academic achievement and student-reported outcomes. International Journal of Educational Research, 105, Article 101714. https://doi.org/10.1016/j.ijer.2020.101714
  39. Magaldi, D., & Berler, M. (2020). Semi-structured interviews. In V. Zeigler-Hill, & T. K. Shackelford (Eds.), Encyclopedia of personality and individual differences (pp. 4825–4830). Springer. https://doi.org/10.1007/978-3-319-24612-3_857
  40. McCormick, K. (2015). SPSS statistics for dummies. John Wiley.
  41. McGarrell, H., & Verbeem, J. (2007). Motivating revision of drafts through formative feedback. ELT Journal, 61(3), 228–236. https://doi.org/10.1093/elt/ccm030
  42. Mertens, U., Finn, B., & Lindner, M. A. (2022). Effects of computer-based feedback on lower- and higher-order learning outcomes: A network meta-analysis. Journal of Educational Psychology, 114(8), 1743–1772. https://doi.org/10.1037/edu0000764
  43. Meyer, J., Jansen, T., Schiller, R., Liebenow, L. W., Steinbach, M., Horbach, A., & Fleckenstein, J. (2024). Using LLMs to bring evidence-based feedback into the classroom: AI-generated feedback increases secondary students’ text revision, motivation, and positive emotions. Computers and Education: Artificial Intelligence, 6, Article 100199. https://doi.org/10.1016/j.caeai.2023.100199
  44. Misiejuk, K., Kalissa, R., & Scianna, J. (2024). Augmenting assessment with AI coding of online student discourse. Computers and Education: Artificial Intelligence, 6, Article 100216. https://doi.org/10.1016/j.caeai.2024.100216
  45. Pandero, E., & Lipnevich, A. A. (2022). A review of feedback models and typologies: Towards an integrative model of feedback elements. Educational Research Review, 35(5), Article 100416. https://doi.org/10.1016/j.edurev.2021.100416
  46. Peters, K., & Halcomb, E. (2015). Interviews in qualitative research. Nurse Researcher, 22(4), 6–7. https://doi.org/10.7748/nr.22.4.6.s2
  47. Ramesh, D., & Sanampudi, S. K. (2022). An automated essay scoring systems: A systematic literature review. Artificial Intelligence Review, 55, 2495–2527. https://doi.org/10.1007/s10462-021-10068-2
  48. Salcedo, J., & McCormick, K. (2020). SPSS statistics (4th ed.). John Wiley.
  49. Schrader, C., & Kalyuga, S. (2020). Linking students’ emotions to engagement and writing performance when learning Japanese letters with a pen-based tablet: An investigation based on individual pen pressure parameters. International Journal of Human-Computer Studies, 135, Article 102374. https://doi.org/10.1016/j.ijhcs.2019.102374
  50. Smith, A. E., & Humphreys, M. S. (2006). Evaluation of unsupervised semantic mapping of natural. Behaviour Research Methods, 38(2), 262–279. https://doi.org/10.3758/BF03192778
  51. Steiss, J., Tate, T., Graham, S., Cruz, J., Hebert, M., Wang, J., Moon, Y., Tseng, W., Warschauer, M., & Olsen, C. B. (2024). Comparing the quality of human and ChatGPT feedback on students’ writing. Learning and Instruction, 91, Article 101894. https://doi.org/10.1016/j.learninstruc.2024.101894
  52. Wardat, Y., Tashtoush, M. A., AlAli, R., & Jarrah, A. M. (2023). ChatGPT: A revolutionary tool for teaching and learning mathematics. Eurasia Journal of Mathematics, Science and Technology Education, 19(7), em2286. https://doi.org/10.29333/ejmste/13272
  53. Yang, S., Nachum, O., Du, Y., Wei, J., Abbeel, P., & Schuurmans, D. (2023). Foundation models for decision making: Problems, methods, and opportunities. arXiv. https://doi.org/10.48550/arXiv.2303.04129
  54. Zawacki-Richter, O., Marin, V. I., Bond, M., & Gouverneur, F. (2019). Systematic review of research on artificial intelligence applications in higher education–Where are the educators? International Journal of Educational Technology in Higher Education, 16, Article 39. https://doi.org/10.1186/s41239-019-0171-0
  55. Zheng, Y., & Stewart, N. (2024). Improving EFL students’ cultural awareness: Reframing moral dilemmatic stories with ChatGPT. Computers and Education: Artificial Intelligence, 6, Article 100223. https://doi.org/10.1016/j.caeai.2024.100223