Research Article
Enhancing university level English proficiency with generative AI: Empirical insights into automated feedback and learning outcomes
More Detail
1 English Language Teaching Unit, The Chinese University of Hong Kong, Hong Kong, CHINA2 Department of Educational Research, Lancaster University, Lancaster, UNITED KINGDOM3 Division of Languages and Communication, College of Professional and Continuing Education, The Hong Kong Polytechnic University, Hong Kong, CHINA* Corresponding Author
Contemporary Educational Technology, 16(4), October 2024, ep541, https://doi.org/10.30935/cedtech/15607
Published: 13 November 2024
OPEN ACCESS 2994 Views 480 Downloads
ABSTRACT
This paper investigates the effects of large language model (LLM) based feedback on the essay writing proficiency of university students in Hong Kong. It focuses on exploring the potential improvements that generative artificial intelligence (AI) can bring to student essay revisions, its effect on student engagement with writing tasks, and the emotions students experience while undergoing the process of revising written work. Utilizing a randomized controlled trial, it draws comparisons between the experiences and performance of 918 language students at a Hong Kong university, some of whom received generated feedback (GPT-3.5-turbo LLM) and some of whom did not. The impact of AI-generated feedback is assessed not only through quantifiable metrics, entailing statistical analysis of the impact of AI feedback on essay grading, but also through subjective indices, student surveys that captured motivational levels and emotional states, as well as thematic analysis of interviews with participating students. The incorporation of AI-generated feedback into the revision process demonstrated significant improvements in the caliber of students’ essays. The quantitative data suggests notable effect sizes of statistical significance, while qualitative feedback from students highlights increases in engagement and motivation as well as a mixed emotional experience during revision among those who received AI feedback.
CITATION (APA)
Chan, S. T. S., Lo, N. P. K., & Wong, A. M. H. (2024). Enhancing university level English proficiency with generative AI: Empirical insights into automated feedback and learning outcomes. Contemporary Educational Technology, 16(4), ep541. https://doi.org/10.30935/cedtech/15607
REFERENCES
- Al Shloul, T., Mazhar, T., Abbas, Q., Iqbal, M., Ghadi, Y. Y., Shahzad, T., Mallek, F., & Hamam, H. (2024). Role of activity-based learning and ChatGPT on students’ performance in education. Computers and Education: Artificial Intelligence, 6, Article 100219. https://doi.org/10.1016/j.caeai.2024.100219
- Al-Khreseh, M. H. (2024). Bridging technology and pedagogy from a global lens: Teachers’ perspectives on integrating ChatGPT in English language teaching. Computers and Education: Artificial Intelligence, 6, Article 100218. https://doi.org/10.1016/j.caeai.2024.100218
- Alvero, A. J., Arthurs, N., Antonio, A. L., Domingue, B. W., Gebre-Medhin, B., Giebel, S., & Stevens, M. L. (2020). AI and holistic review: Informing human reading in college admissions. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society (pp. 200–206). ACM. https://doi.org/10.1145/3375627.3375871
- Aslan, S., Durham, L. M., Alyuz, N., Okur, E., Sharma, S., Savur, C., & Nachman, L. (2024). Immersive multi-modal pedagogical conversational artificial intelligence for early childhood education: An exploratory case study in the wild. Computers and Education: Artificial Intelligence, 6, Article 100220. https://doi.org/10.1016/j.caeai.2024.100220
- Attride-Stirling, J. (2001). Thematic networks: An analytical tool for qualitative research. Commission for Health Improvement, 1(3), 385–405. https://doi.org/10.1177/146879410100100307
- Bowman, S. R. (2023). Eight things to know about large language models. arXiv. https://doi.org/10.48550/arXiv.2304.00612
- Braun, V., & Clarke, V. (2006). Using thematic analysis in psychology. Qualitative Research in Psychology, 3(2), 77–101. https://doi.org/10.1191/1478088706qp063oa
- Bressane, A., Zwirn, D., Essiptchouk, A., Saraiva, A. C. V., de Campos Carvalho, F. L., Formiga, J. K. S., de Castro Medeiros, L. C., & Negri, R. G. (2024). Understanding the role of study strategies and learning disabilities on student academic performance to enhance educational approaches: A proposal using artificial intelligence. Computers and Education: Artificial Intelligence, 6, Article 100196. https://doi.org/10.1016/j.caeai.2023.100196
- British Educational Research Association. (2018). Ethical guidelines for educational research. British Educational Research Association.
- Chang, Y., Wang, X., Wang, J., Wu, Y., Yang, L., Zhu, K., Chen, H., Wang, C., Wang, Y., Ye, W., Zhang, Y., Zhang, Y., Yu, P. S., Yang, Q., & Xie, X. (2024). A survey on evolution of large language models. ACM Transactions on Intelligent Systems and Technology, 15(3), Article 39. https://doi.org/10.1145/3641289
- Chen, L., Chen, P., & Lin, Z. (2020). Artificial intelligence in education: A review. IEEE Access, 8, 75264–75278. https://doi.org/10.1109/ACCESS.2020.2988510
- Chen, S.-M., & Bai, S.-M. (2010). Using data mining techniques to automatically construct concept maps for adaptive learning systems. Expert Systems with Applications, 37(6), 4496–4503. https://doi.org/10.1016/j.eswa.2009.12.060
- Chia, Y. K., Hong, P., Bing, L., & Pira, S. (2023). Instructeval: Towards holistic evaluation of instruction-tuned large language models. arXiv. https://doi.org/10.48550/arXiv.2306.04757
- Crossley, S. A., Baffour, P., Tian, Y., Picou, A., Banner, M., & Boser, U. (2022). The persuasive essays for rating, selecting, and understanding argumentative and discourse element (PERSUADE) corpus 1.0. Assessing Writing, 54, Article 100667. https://doi.org/10.1016/j.asw.2022.100667
- Dai, W., Lin, J., Jin, F., Li, T., Tsai, Y.-S., Gasevic, D., & Chen, G. (2023). Can large language models provide feedback to students? A case study on ChatGPT. In Proceedings of the 2023 IEEE International Conference on Advanced Learning Technologies (pp. 323–325). IEEE. https://doi.org/10.1109/ICALT58122.2023.00100
- Eccles, J. S., & Wigfield, A. (2020). From expectancy-value theory to situated expectancy-value theory: A developmental, social cognitive, and sociocultural perspective on motivation. Contemporary Educational Psychology, 61, Article 101859. https://doi.org/10.1016/j.cedpsych.2020.101859
- Essel, H. B., Vlachopoulos, D., Essuman, A. B., & Amankwa, J. O. (2024). ChatGPT effects on cognitive skills of undergraduate students: Receiving instant responses from AI-based conversational large language models (LLMs). Computers and Education: Artificial Intelligence, 6, Article 100198. https://doi.org/10.1016/j.caeai.2023.100198
- Feng, S., & Law, N. (2021). Mapping artificial intelligence in education research: A network-based keyword analysis. International Journal of Artificial Intelligence in Education, 31, 277–303. https://doi.org/10.1007/s40593-021-00244-4
- Fleckenstein, J., Liebenow, L. W., & Meyer, J. (2023). Automated feedback and writing: A multi-level meta-analysis of effects on students’ performance. Frontiers in Artificial Intelligence, 6. https://doi.org/10.3389/frai.2023.1162454
- Gao, R., Merzdorf, H. E., Anwar, S., Hipwell, M. C., & Srinivasa, A. R. (2024). Automatic assessment of text-based responses in post-secondary education. Computers and Education: Artificial Intelligence, 6, Article 100206. https://doi.org/10.1016/j.caeai.2024.100206
- Gnepp, J., Klayman, J., Williamson, I. O., & Barlas, S. (2020). The future of feedback: Motivating performance improvement through future-focused feedback. PLoS ONE, 15(6), Article e0234444. https://doi.org/10.1371/journal.pone.0234444
- Graham, S., Hebert, M., & Harris, K. R. (2015). Formative assessment and writing. The Elementary School Journal, 115(4), 523–547. https://doi.org/10.1086/681947
- Guthrie, G. (2010). Basic research methods: An entry to social science research. SAGE. https://doi.org/10.4135/9788132105961
- Hahn, M. G., Navarro, S. M. B., La Fuente Valentin, I., & Burgos, D. (2021). A systematic review of the effects of automatic scoring and automatic feedback in educational settings. IEEE Access, 9, 108190–108198. https://doi.org/10.1109/ACCESS.2021.3100890
- Holmes, A. G. D. (2020). Researcher positionality–A consideration of its influence and place in qualitative research–A new researcher guide. Shanlax International Journal of Education, 8(4), 1–10. https://doi.org/10.34293/education.v8i4.3232
- Huang, A. Y. Q., Lu, O. H. T., & Yang, S. J. H. (2023). Effects of artificial intelligence-enabled personalized recommendations on learners’ learning engagement, motivation, and outcomes in a flipped classroom. Computers & Education, 194, Article 104684. https://doi.org/10.1016/j.compedu.2022.104684
- Hwang, G.-J., Xie, H., Wah, B. W., & Gašević, D. (2020). Vision, challenges, roles and research issues of artificial intelligence in education. Computers and Education: Artificial Intelligence, 1, Article 100001. https://doi.org/10.1016/j.caeai.2020.100001
- Jacobsen, L. J., & Weber, K. E. (2023). The promises and pitfalls of ChatGPT as a feedback provider in higher education: An exploratory study of prompt engineering and the quality of AI-driven feedback. OSF Preprints. https://doi.org/10.31219/osf.io/cr257
- Kabudi, T., Pappas, I., & Olsen, D. H. (2021). AI-enabled adaptive learning systems: A systematic mapping of the literature. Computers and Education: Artificial Intelligence, 2, Article 100017. https://doi.org/10.1016/j.caeai.2021.100017
- Kasneci, E., Seßler, K., Küchemann, S., Bannert, M., Dementieva, D., Fischer, F., Gasser, U., Groh, G., Günnemann, S., Hüllermeier, E., Krusche, S., Kutyniok, G., Michaeli, T., Nerdel, C., Pfeffer, J., Poquet, O., Sailer, M., Schmidt, A., Seidel, T., …, & Kasneci, G. (2023). ChatGPT for good? On opportunities and challenges of large language models for education. Learning and Individual Differences, 103, Article 102274. https://doi.org/10.1016/j.lindif.2023.102274
- Knoth, N., Tolzin, A., Janson, A., & Leimeister, J. M. (2024). AI literacy and its implications for prompt engineering strategies. Computers and Education: Artificial Intelligence, 6, Article 100225. https://doi.org/10.1016/j.caeai.2024.100225
- Langley, P. (2019). An integrative framework for artificial intelligence. Proceedings of the AAAI Conference on Artificial Intelligence, 33(1). https://doi.org/10.1609/aaai.v33i01.33019670
- Lee, D., Arnold, M., Srivastava, A., Plastow, K., Strwlan, P., Ploeckl, F., Lekkas, D., & Palmer, E. (2024a). The impact of generative AI on higher education learning and teaching: A study of educators’ perspectives. Computers and Education: Artificial Intelligence, 6, Article 100221. https://doi.org/10.1016/j.caeai.2024.100221
- Lee, G.-G., Latif, E., Wu, X., Liu, N., & Zhai, X. (2024b). Applying large language models and chain-of-thought for automatic scoring. Computers and Education: Artificial Intelligence, 6, Article 100213. https://doi.org/10.1016/j.caeai.2024.100213
- Li, C., & Xing, W. (2021). Natural language generation using deep learning to support MOOC learners. International Journal of Artificial Intelligence in Education, 31, 186–214. https://doi.org/10.1007/s40593-020-00235-x
- Lipnevich, A. A., Murano, D., Krannich, M., & Goetz, T. (2021). Should I grade or should I comment: Links among feedback, emotions, and performance. Learning and Individual Differences, 89, Article 102020. https://doi.org/10.1016/j.lindif.2021.102020
- Luckin, R. (2017). Towards artificial intelligence-based assessment systems. Nature Human Behaviour, 1, Article 0028. https://doi.org/10.1038/s41562-016-0028
- Madigan, D. J., & Kim, L. E. (2021). Does teacher burnout affect students? A systematic review of its association with academic achievement and student-reported outcomes. International Journal of Educational Research, 105, Article 101714. https://doi.org/10.1016/j.ijer.2020.101714
- Magaldi, D., & Berler, M. (2020). Semi-structured interviews. In V. Zeigler-Hill, & T. K. Shackelford (Eds.), Encyclopedia of personality and individual differences (pp. 4825–4830). Springer. https://doi.org/10.1007/978-3-319-24612-3_857
- McCormick, K. (2015). SPSS statistics for dummies. John Wiley.
- McGarrell, H., & Verbeem, J. (2007). Motivating revision of drafts through formative feedback. ELT Journal, 61(3), 228–236. https://doi.org/10.1093/elt/ccm030
- Mertens, U., Finn, B., & Lindner, M. A. (2022). Effects of computer-based feedback on lower- and higher-order learning outcomes: A network meta-analysis. Journal of Educational Psychology, 114(8), 1743–1772. https://doi.org/10.1037/edu0000764
- Meyer, J., Jansen, T., Schiller, R., Liebenow, L. W., Steinbach, M., Horbach, A., & Fleckenstein, J. (2024). Using LLMs to bring evidence-based feedback into the classroom: AI-generated feedback increases secondary students’ text revision, motivation, and positive emotions. Computers and Education: Artificial Intelligence, 6, Article 100199. https://doi.org/10.1016/j.caeai.2023.100199
- Misiejuk, K., Kalissa, R., & Scianna, J. (2024). Augmenting assessment with AI coding of online student discourse. Computers and Education: Artificial Intelligence, 6, Article 100216. https://doi.org/10.1016/j.caeai.2024.100216
- Pandero, E., & Lipnevich, A. A. (2022). A review of feedback models and typologies: Towards an integrative model of feedback elements. Educational Research Review, 35(5), Article 100416. https://doi.org/10.1016/j.edurev.2021.100416
- Peters, K., & Halcomb, E. (2015). Interviews in qualitative research. Nurse Researcher, 22(4), 6–7. https://doi.org/10.7748/nr.22.4.6.s2
- Ramesh, D., & Sanampudi, S. K. (2022). An automated essay scoring systems: A systematic literature review. Artificial Intelligence Review, 55, 2495–2527. https://doi.org/10.1007/s10462-021-10068-2
- Salcedo, J., & McCormick, K. (2020). SPSS statistics (4th ed.). John Wiley.
- Schrader, C., & Kalyuga, S. (2020). Linking students’ emotions to engagement and writing performance when learning Japanese letters with a pen-based tablet: An investigation based on individual pen pressure parameters. International Journal of Human-Computer Studies, 135, Article 102374. https://doi.org/10.1016/j.ijhcs.2019.102374
- Smith, A. E., & Humphreys, M. S. (2006). Evaluation of unsupervised semantic mapping of natural. Behaviour Research Methods, 38(2), 262–279. https://doi.org/10.3758/BF03192778
- Steiss, J., Tate, T., Graham, S., Cruz, J., Hebert, M., Wang, J., Moon, Y., Tseng, W., Warschauer, M., & Olsen, C. B. (2024). Comparing the quality of human and ChatGPT feedback on students’ writing. Learning and Instruction, 91, Article 101894. https://doi.org/10.1016/j.learninstruc.2024.101894
- Wardat, Y., Tashtoush, M. A., AlAli, R., & Jarrah, A. M. (2023). ChatGPT: A revolutionary tool for teaching and learning mathematics. Eurasia Journal of Mathematics, Science and Technology Education, 19(7), em2286. https://doi.org/10.29333/ejmste/13272
- Yang, S., Nachum, O., Du, Y., Wei, J., Abbeel, P., & Schuurmans, D. (2023). Foundation models for decision making: Problems, methods, and opportunities. arXiv. https://doi.org/10.48550/arXiv.2303.04129
- Zawacki-Richter, O., Marin, V. I., Bond, M., & Gouverneur, F. (2019). Systematic review of research on artificial intelligence applications in higher education–Where are the educators? International Journal of Educational Technology in Higher Education, 16, Article 39. https://doi.org/10.1186/s41239-019-0171-0
- Zheng, Y., & Stewart, N. (2024). Improving EFL students’ cultural awareness: Reframing moral dilemmatic stories with ChatGPT. Computers and Education: Artificial Intelligence, 6, Article 100223. https://doi.org/10.1016/j.caeai.2024.100223