Research Article

Exploring effective methods for automated essay scoring of non-native speakers

Kornwipa Poonpon 1 , Paiboon Manorom 1 , Wirapong Chansanam 1 *
More Detail
1 Faculty of Humanities and Social Sciences, Khon Kaen University, Khon Kaen, THAILAND* Corresponding Author
Contemporary Educational Technology, 15(4), October 2023, ep475, https://doi.org/10.30935/cedtech/13740
Published Online: 30 September 2023, Published: 01 October 2023
OPEN ACCESS   1014 Views   1067 Downloads
Download Full Text (PDF)

ABSTRACT

Automated essay scoring (AES) has become a valuable tool in educational settings, providing efficient and objective evaluations of student essays. However, the majority of AES systems have primarily focused on native English speakers, leaving a critical gap in the evaluation of non-native speakers’ writing skills. This research addresses this gap by exploring the effectiveness of automated essay-scoring methods specifically designed for non-native speakers. The study acknowledges the unique challenges posed by variations in language proficiency, cultural differences, and linguistic complexities when assessing non-native speakers’ writing abilities. This work focuses on the automated student assessment prize and Khon Kaen University academic English language test dataset and presents an approach that leverages variants of the long short-term memory network model to learn features and compare results with the Kappa coefficient. The findings demonstrate that the proposed framework and approach, which involve joint learning of different essay representations, yield significant benefits and achieve results comparable to state-of-the-art deep learning models. These results suggest that the novel text representation proposed in this paper holds promise as a new and effective choice for assessing the writing tasks of non-native speakers. The result of this study can apply to advance educational assessment practices and promote equitable opportunities for language learners worldwide by enhancing the evaluation process for non-native speakers

CITATION (APA)

Poonpon, K., Manorom, P., & Chansanam, W. (2023). Exploring effective methods for automated essay scoring of non-native speakers. Contemporary Educational Technology, 15(4), ep475. https://doi.org/10.30935/cedtech/13740

REFERENCES

  1. Ajay, H. B. (1973). Strategies for content analysis of essays by computer. University of Connecticut. https://search.proquest.com/openview/739b97ecbfd94af0356f4da011575ef8/1?pq-origsite=gscholar&cbl=18750&diss=y
  2. Alikaniotis, D., Yannakoudakis, H., & Rei, M. (2016). Automatic text scoring using neural networks. arXiv, 1606, 04289. https://doi.org/10.18653/v1/P16-1068
  3. Attali, Y., & Burstein, J. (2006). Automated essay scoring with e-rater® V.2.0. Journal of Technology, Learning, and Assessment, 4(3), i-21. https://doi.org/10.1002/j.2333-8504.2004.tb01972.x
  4. Beseiso, M., & Alzahrani, S. (2020). An empirical analysis of BERT embedding for automated essay scoring. International Journal of Advanced Computer Science and Applications, 11(10), 204-210. https://doi.org/10.14569/IJACSA.2020.0111027
  5. Chen, Z., & Zhou, Y. (2019). Research on automatic essay scoring of composition based on CNN and OR. In Proceedings of the 2nd International Conference on Artificial Intelligence and Big Data (pp. 13-18). IEEE. https://doi.org/10.1109/ICAIBD.2019.8837007
  6. Cozma, M., Butnaru, A. M., & Lonescu, R. T. (2018). Automated essay scoring with string kernels and word embeddings. arXiv, 1804, 07954. https://doi.org/10.18653/v1/P18-2080
  7. Doewes, A., Kurdhi, N., & Saxena, A. (2023). Evaluating quadratic weighted kappa as the standard performance metric for automated essay scoring. In Proceedings of the 16th International Conference on Educational Data Mining (pp. 103-113). International Educational Data Mining Society.
  8. Dong, F., Zhang, Y., & Yang, J. (2017). Attention-based recurrent convolutional neural network for automatic essay scoring. In Proceedings of the 21st Conference on Computational Natural Language Learning (pp. 153-162). https://doi.org/10.18653/v1/K17-1017
  9. Driessens, K., & Džeroski, S. (2005). Combining model-based and instance-based learning for first order regression. In Proceedings of the 22nd International Conference on Machine Learning (pp. 193-200). https://doi.org/10.1145/1102351.1102376
  10. Evanini, K., Hauck, M., Wang, X., & Blanchard, D. (2015). Automated scoring for the TOEFL junior® comprehensive writing and speaking test. ETS Research Report Series, 1, 1-11. https://doi.org/10.1002/ets2.12052
  11. Firoozi, T., Bulut, O., Epp, C. D., Naeimabadi, A., & Barbosa, D. (2022). The effect of fine-tuned word embedding techniques on the accuracy of automated essay scoring systems using neural networks. Journal of Applied Testing Technology, 23, 21-29.
  12. Foltz, P. W., Laham, D., & Landauer, T. K. (1999). The intelligent essay assessor: Applications to educational technology. Interactive Multimedia Electronic Journal of Computer-Enhanced Learning, 1(2), 939-944. https://doi.org/10.1201/9781351264808-1
  13. Foltz, P. W., Yan, D., & Rupp, A. A. (2020). The past, present, and future of automated scoring for complex tasks. In D. Yan, A. A. Rupp, & P. W. Foltz (Eds.), Handbook of automated scoring: Theory into practice (pp. 1-11). CRC Press.
  14. Ghosh, S., & Fatima, S. (2010). Design of an automated essay grading (AEG) system in Indian context. International Journal of Computer Applications, 1(11), 60-65. https://doi.org/10.5120/237-391
  15. Graves, A., & Graves, A. (2012). Supervised sequence labelling. Springer. https://doi.org/10.1007/978-3-642-24797-2_2
  16. Haberman, S. J. (2011). Use of e-rater® in scoring of the TOEFL iBT® writing test. ETS Research Report Series, 2, 1-13. https://doi.org/10.1002/j.2333-8504.2011.tb02261.x
  17. He, Y., Jiang, F., Chu, X., & Li, P. (2022). Automated Chinese essay scoring from multiple traits. In Proceedings of the 29th International Conference on Computational Linguistics (pp. 3007-3016).
  18. Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735-1780. https://doi.org/10.1162/neco.1997.9.8.1735
  19. Horbach, A., & Zesch, T. (2019). The influence of variance in learner answers on automatic content scoring. Frontiers in Education, 4, 28. https://doi.org/10.3389/feduc.2019.00028
  20. Huang, X., Sun, J., & Sun, J. (2018). A car-following model considering asymmetric driving behavior based on long short-term memory neural networks. Transportation Research Part C: Emerging Technologies, 95, 346-362. https://doi.org/10.1016/j.trc.2018.07.022
  21. Hussein, M. A., Hassan, H., & Nassef, M. (2019). Automated language essay scoring systems: A literature review. PeerJ Computer Science, 5, e208. https://doi.org/10.7717/peerj-cs.208
  22. Kaggle. (2012). The Hewlett Foundation: Automated essay scoring. https://www.kaggle.com/c/asap-aes/overview/evaluation
  23. Kulkarni, C., Socher, R., Bernstein, M. S., & Klemmer, S. R. (2014). Scaling short-answer grading by combining peer assessment with algorithmic scoring. In Proceedings of the 1st ACM Conference on Learning @ Scale Conference (pp. 99-108). ACM. https://doi.org/10.1145/2556325.2566238
  24. Lahitani, A. R., Permanasari, A. E., & Setiawan, N. A. (2016, April). Cosine similarity to determine similarity measure: Study case in online essay assessment. In Proceedings of the 4th International Conference on Cyber and IT Service Management (pp. 1-6). IEEE. https://doi.org/10.1109/CITSM.2016.7577578
  25. Li, X., Chen, M., Nie, J., Liu, Z., Feng, Z., & Cai, Y. (2018). Coherence-based automated essay scoring using self-attention. In M. Sun, T. Liu, X. Wang, Z. Liu, & Y. Liu (Eds.), Chinese computational linguistics and natural language processing based on naturally annotated big data (pp. 386-397). Springer. https://doi.org/10.1007/978-3-030-01716-3_32
  26. Liang, G., On, B. W., Jeong, D., Kim, H. C., & Choi, G. S. (2018). Automated essay scoring: A Siamese bidirectional LSTM neural network architecture. Symmetry, 10(12), 682. https://doi.org/10.3390/sym10120682
  27. Mathias, S., & Bhattacharyya, P. (2018). ASAP++: Enriching ASAP automated essay grading dataset with essay attribute scores. In Proceedings of the 11th International Conference on Language Resources and Evaluation.
  28. Mathias, S., & Bhattacharyya, P. (2018). Thank “goodness”! A way to measure style in student essays. In Proceedings of the 5th Workshop on Natural Language Processing Techniques for Educational Applications (pp. 35-41). https://doi.org/10.18653/v1/W18-3705
  29. Mathias, S., & Bhattacharyya, P. (2020). Can neural networks automatically score essay traits? In Proceedings of the 15th Workshop on Innovative Use of NLP for Building Educational Applications (pp. 85-91). https://doi.org/10.18653/v1/2020.bea-1.8
  30. Mesgar, M., & Strube, M. (2018). A neural local coherence model for text quality assessment. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (pp. 4328-4339). Association for Computational Linguistics. https://doi.org/10.18653/v1/D18-1464
  31. Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. arXiv, 1310, 4546. https://doi.org/10.48550/arXiv.1310.4546
  32. Munir, H., Wnuk, K., & Runeson, P. (2016). Open innovation in software engineering: A systematic mapping study. Empirical Software Engineering, 21, 684-723. https://doi.org/10.1007/s10664-015-9380-x
  33. Nguyen, H., & Dery, L. (2016). Neural networks for automated essay grading. https://cs224d.stanford.edu/reports/huyenn.pdf
  34. Pennington, J., Socher, R., & Manning, C. D. (2014). Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (pp. 1532-1543). https://doi.org/10.3115/v1/D14-1162
  35. Phoophuangpairoj, R., & Pipattarasakul, P. (2022). Preliminary indicators of EFL essay writing for teachers’ feedback using automatic text analysis. International Journal of Educational Methodology, 8(1), 55-68. https://doi.org/10.12973/ijem.8.1.55
  36. Powers, D. E., Burstein, J. C., Chodorow, M., Fowles, M. E., & Kukich, K. (2002). Stumping e-rater: Challenging the validity of automated essay scoring. Computers in Human Behavior, 18(2), 103-134. https://doi.org/10.1016/S0747-5632(01)00052-8
  37. Ramesh, D., & Sanampudi, S.K. (2022). An automated essay scoring systems: A systematic literature review. Artificial Intelligence Review, 55, 2495-2527. https://doi.org/10.1007/s10462-021-10068-2
  38. Ramnarain-Seetohul, V., Bassoo, V., & Rosunally, Y. (2022). Similarity measures in automated essay scoring systems: A ten-year review. Education and Information Technologies, 27(4), 5573-5604. https://doi.org/10.1007/s10639-021-10838-z
  39. Roy, S., Dandapat, S., Nagesh, A., & Narahari, Y. (2016). Wisdom of students: A consistent automatic short answer grading technique. NLP Association of India.
  40. Rudner, L. M., Garcia, V., & Welch, C. (2006). An evaluation of IntelliMetric™ essay scoring system. Journal of Technology, Learning and Assessment, 4(4).
  41. Rudner, L. M., & Liang, T. (2002). Automated essay scoring using Bayes' theorem. The Journal of Technology, Learning and Assessment, 1(2). https://ejournals.bc.edu/index.php/jtla/article/view/1668
  42. Rupp, A. A., Casabianca, J. M., Krüger, M., Keller, S., & Köller, O. (2019). Automated essay scoring at scale: A case study in Switzerland and Germany. ETS Research Report Series, 1, 1-23. https://doi.org/10.1002/ets2.12249
  43. Shaker, A., & Hüllermeier, E. (2012). IBLStreams: A system for instance-based classification and regression on data streams. Evolving Systems, 3, 235-249. https://doi.org/10.1007/s12530-012-9059-0
  44. Shermis, M. D., & Hamner, B. (2013). Contrasting state-of-the-art automated scoring of essays. In M. D. Shermis, & J. Burstein (Eds.), Handbook of automated essay evaluation (pp. 335-368). Routledge. https://doi.org/10.4324/9780203122761-20
  45. Shermis, M. D., Mzumara, H. R., Olson, J., & Harrington, S. (2001). On-line grading of student essays: PEG goes on the world wide web. Assessment & Evaluation in Higher Education, 26(3), 247-259. https://doi.org/10.1080/02602930120052404
  46. Srisawat, C., & Poonpon, K. (2023). Revision of an academic English writing rubric for a graduate school admission test. PASAA, 65, 234-262.
  47. Steimel, K., & Riordan, B. (2020). Towards instance-based content scoring with pre-trained transformer models. In Proceedings of the 34th AAAI Conference on Artificial Intelligence.
  48. Taghipour, K., & Ng, H. T. (2016). A neural approach to automated essay scoring. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (pp. 1882-1891). https://doi.org/10.18653/v1/D16-1193
  49. Tashu, T. M., & Horváth, T. (2020). Smart score-short answer scoring made easy using Sem-LSH. In Proceedings of the 14th International Conference on Semantic Computing (pp. 145-149). IEEE. https://doi.org/10.1109/ICSC.2020.00028
  50. Uto, M., Xie, Y., & Ueno, M. (2020). Neural automated essay scoring incorporating handcrafted features. In Proceedings of the 28th International Conference on Computational Linguistics (pp. 6077-6088). https://doi.org/10.18653/v1/2020.coling-main.535
  51. Viera, A. J., & Garrett, J. M. (2005). Understanding interobserver agreement: The kappa statistic. Family Medicine, 37(5), 360-363.
  52. Weigle, S. C. (2002). Assessing writing. Cambridge University Press.https://doi.org/10.1017/CBO9780511732997
  53. Williamson, D. M., Xi, X., & Breyer, F. J. (2012). A framework for evaluation and use of automated scoring. Educational Measurement Issues and Practices, 31(1), 2-13. https://doi.org/10.1111/j.1745-3992.2011.00223.x
  54. Wolfe, E. W., Song, T., & Jiao, H. (2016). Features of difficult-to-score essays. Assessing Writing, 27, 1-10. https://doi.org/10.1016/j.asw.2015.06.002
  55. Yang, R., Cao, J., Wen, Z., Wu, Y., & He, X. (2020). Enhancing automated essay scoring performance via fine-tuning pre-trained language models with combination of regression and ranking. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2020 (pp. 1560-1569). https://doi.org/10.18653/v1/2020.findings-emnlp.141
  56. Yang, Z., Yu, Y., You, C., Steinhardt, J., & Ma, Y. (2020). Rethinking bias-variance trade-off for generalization of neural networks. In Proceedings of the International Conference on Machine Learning (pp. 10767-10777). PMLR.
  57. Zechner, K., Yoon, S., Bhat, S., Leong, C. W. (2017). Comparative evaluation of automated scoring of syntactic competence of non-native speakers. Computers in Human Behavior, 76, 672-682. https://doi.org/10.1016/j.chb.2017.01.060
  58. Zhao, Z., Liu, T., Li, S., Li, B., & Du, X. (2017). Ngram2vec: Learning improved word representations from ngram co-occurrence statistics. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (pp. 244-253). https://doi.org/10.18653/v1/D17-1023
  59. Zhuang, F., Qi, Z., Duan, K., Xi, D., Zhu, Y., Zhu, H., Xiong, H., & He, Q. (2020). A comprehensive survey on transfer learning. Proceedings of the IEEE, 109(1), 43-76. https://doi.org/10.1109/JPROC.2020.3004555
  60. Zupanc, K., & Bosnić, Z. (2020). Improvement of automated essay grading by grouping similar graders. Fundamenta Informaticae [Fundamentals of Informatics], 172(3), 239-259. https://doi.org/10.3233/FI-2020-1904