Exploring effective methods for automated essay scoring of non-native speakers

Kornwipa Poonpon; Paiboon Manorom; Wirapong Chansanam

doi:10.30935/cedtech/13740

Research Article

Exploring effective methods for automated essay scoring of non-native speakers

Kornwipa Poonpon ¹ , Paiboon Manorom ¹ , Wirapong Chansanam ¹ ^*

More Detail

Contemporary Educational Technology, 15(4), October 2023, ep475, https://doi.org/10.30935/cedtech/13740

Published Online: 30 September 2023, Published: 01 October 2023

OPEN ACCESS 1244 Views 1336 Downloads

Download Full Text (PDF)

ABSTRACT

Automated essay scoring (AES) has become a valuable tool in educational settings, providing efficient and objective evaluations of student essays. However, the majority of AES systems have primarily focused on native English speakers, leaving a critical gap in the evaluation of non-native speakers’ writing skills. This research addresses this gap by exploring the effectiveness of automated essay-scoring methods specifically designed for non-native speakers. The study acknowledges the unique challenges posed by variations in language proficiency, cultural differences, and linguistic complexities when assessing non-native speakers’ writing abilities. This work focuses on the automated student assessment prize and Khon Kaen University academic English language test dataset and presents an approach that leverages variants of the long short-term memory network model to learn features and compare results with the Kappa coefficient. The findings demonstrate that the proposed framework and approach, which involve joint learning of different essay representations, yield significant benefits and achieve results comparable to state-of-the-art deep learning models. These results suggest that the novel text representation proposed in this paper holds promise as a new and effective choice for assessing the writing tasks of non-native speakers. The result of this study can apply to advance educational assessment practices and promote equitable opportunities for language learners worldwide by enhancing the evaluation process for non-native speakers

Keywords: automated essay scoring, non-native speakers, machine learning, long short-term memory network, Thailand

CITATION (APA)

Poonpon, K., Manorom, P., & Chansanam, W. (2023). Exploring effective methods for automated essay scoring of non-native speakers. Contemporary Educational Technology, 15(4), ep475. https://doi.org/10.30935/cedtech/13740

REFERENCES

Ajay, H. B. (1973). Strategies for content analysis of essays by computer. University of Connecticut. https://search.proquest.com/openview/739b97ecbfd94af0356f4da011575ef8/1?pq-origsite=gscholar&cbl=18750&diss=y
Alikaniotis, D., Yannakoudakis, H., & Rei, M. (2016). Automatic text scoring using neural networks. arXiv, 1606, 04289. https://doi.org/10.18653/v1/P16-1068
Attali, Y., & Burstein, J. (2006). Automated essay scoring with e-rater^® V.2.0. Journal of Technology, Learning, and Assessment, 4(3), i-21. https://doi.org/10.1002/j.2333-8504.2004.tb01972.x
Beseiso, M., & Alzahrani, S. (2020). An empirical analysis of BERT embedding for automated essay scoring. International Journal of Advanced Computer Science and Applications, 11(10), 204-210. https://doi.org/10.14569/IJACSA.2020.0111027
Chen, Z., & Zhou, Y. (2019). Research on automatic essay scoring of composition based on CNN and OR. In Proceedings of the 2^nd International Conference on Artificial Intelligence and Big Data (pp. 13-18). IEEE. https://doi.org/10.1109/ICAIBD.2019.8837007
Cozma, M., Butnaru, A. M., & Lonescu, R. T. (2018). Automated essay scoring with string kernels and word embeddings. arXiv, 1804, 07954. https://doi.org/10.18653/v1/P18-2080
Doewes, A., Kurdhi, N., & Saxena, A. (2023). Evaluating quadratic weighted kappa as the standard performance metric for automated essay scoring. In Proceedings of the 16^th International Conference on Educational Data Mining (pp. 103-113). International Educational Data Mining Society.
Dong, F., Zhang, Y., & Yang, J. (2017). Attention-based recurrent convolutional neural network for automatic essay scoring. In Proceedings of the 21^st Conference on Computational Natural Language Learning (pp. 153-162). https://doi.org/10.18653/v1/K17-1017
Driessens, K., & Džeroski, S. (2005). Combining model-based and instance-based learning for first order regression. In Proceedings of the 22^nd International Conference on Machine Learning (pp. 193-200). https://doi.org/10.1145/1102351.1102376
Evanini, K., Hauck, M., Wang, X., & Blanchard, D. (2015). Automated scoring for the TOEFL junior® comprehensive writing and speaking test. ETS Research Report Series, 1, 1-11. https://doi.org/10.1002/ets2.12052
Firoozi, T., Bulut, O., Epp, C. D., Naeimabadi, A., & Barbosa, D. (2022). The effect of fine-tuned word embedding techniques on the accuracy of automated essay scoring systems using neural networks. Journal of Applied Testing Technology, 23, 21-29.
Foltz, P. W., Laham, D., & Landauer, T. K. (1999). The intelligent essay assessor: Applications to educational technology. Interactive Multimedia Electronic Journal of Computer-Enhanced Learning, 1(2), 939-944. https://doi.org/10.1201/9781351264808-1
Foltz, P. W., Yan, D., & Rupp, A. A. (2020). The past, present, and future of automated scoring for complex tasks. In D. Yan, A. A. Rupp, & P. W. Foltz (Eds.), Handbook of automated scoring: Theory into practice (pp. 1-11). CRC Press.
Ghosh, S., & Fatima, S. (2010). Design of an automated essay grading (AEG) system in Indian context. International Journal of Computer Applications, 1(11), 60-65. https://doi.org/10.5120/237-391
Graves, A., & Graves, A. (2012). Supervised sequence labelling. Springer. https://doi.org/10.1007/978-3-642-24797-2_2
Haberman, S. J. (2011). Use of e-rater® in scoring of the TOEFL iBT® writing test. ETS Research Report Series, 2, 1-13. https://doi.org/10.1002/j.2333-8504.2011.tb02261.x
He, Y., Jiang, F., Chu, X., & Li, P. (2022). Automated Chinese essay scoring from multiple traits. In Proceedings of the 29^th International Conference on Computational Linguistics (pp. 3007-3016).
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735-1780. https://doi.org/10.1162/neco.1997.9.8.1735
Horbach, A., & Zesch, T. (2019). The influence of variance in learner answers on automatic content scoring. Frontiers in Education, 4, 28. https://doi.org/10.3389/feduc.2019.00028
Huang, X., Sun, J., & Sun, J. (2018). A car-following model considering asymmetric driving behavior based on long short-term memory neural networks. Transportation Research Part C: Emerging Technologies, 95, 346-362. https://doi.org/10.1016/j.trc.2018.07.022
Hussein, M. A., Hassan, H., & Nassef, M. (2019). Automated language essay scoring systems: A literature review. PeerJ Computer Science, 5, e208. https://doi.org/10.7717/peerj-cs.208
Kaggle. (2012). The Hewlett Foundation: Automated essay scoring. https://www.kaggle.com/c/asap-aes/overview/evaluation
Kulkarni, C., Socher, R., Bernstein, M. S., & Klemmer, S. R. (2014). Scaling short-answer grading by combining peer assessment with algorithmic scoring. In Proceedings of the 1^st ACM Conference on Learning @ Scale Conference (pp. 99-108). ACM. https://doi.org/10.1145/2556325.2566238
Lahitani, A. R., Permanasari, A. E., & Setiawan, N. A. (2016, April). Cosine similarity to determine similarity measure: Study case in online essay assessment. In Proceedings of the 4^th International Conference on Cyber and IT Service Management (pp. 1-6). IEEE. https://doi.org/10.1109/CITSM.2016.7577578
Li, X., Chen, M., Nie, J., Liu, Z., Feng, Z., & Cai, Y. (2018). Coherence-based automated essay scoring using self-attention. In M. Sun, T. Liu, X. Wang, Z. Liu, & Y. Liu (Eds.), Chinese computational linguistics and natural language processing based on naturally annotated big data (pp. 386-397). Springer. https://doi.org/10.1007/978-3-030-01716-3_32
Liang, G., On, B. W., Jeong, D., Kim, H. C., & Choi, G. S. (2018). Automated essay scoring: A Siamese bidirectional LSTM neural network architecture. Symmetry, 10(12), 682. https://doi.org/10.3390/sym10120682
Mathias, S., & Bhattacharyya, P. (2018). ASAP++: Enriching ASAP automated essay grading dataset with essay attribute scores. In Proceedings of the 11^th International Conference on Language Resources and Evaluation.
Mathias, S., & Bhattacharyya, P. (2018). Thank “goodness”! A way to measure style in student essays. In Proceedings of the 5^th Workshop on Natural Language Processing Techniques for Educational Applications (pp. 35-41). https://doi.org/10.18653/v1/W18-3705
Mathias, S., & Bhattacharyya, P. (2020). Can neural networks automatically score essay traits? In Proceedings of the 15^th Workshop on Innovative Use of NLP for Building Educational Applications (pp. 85-91). https://doi.org/10.18653/v1/2020.bea-1.8
Mesgar, M., & Strube, M. (2018). A neural local coherence model for text quality assessment. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (pp. 4328-4339). Association for Computational Linguistics. https://doi.org/10.18653/v1/D18-1464
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. arXiv, 1310, 4546. https://doi.org/10.48550/arXiv.1310.4546
Munir, H., Wnuk, K., & Runeson, P. (2016). Open innovation in software engineering: A systematic mapping study. Empirical Software Engineering, 21, 684-723. https://doi.org/10.1007/s10664-015-9380-x
Nguyen, H., & Dery, L. (2016). Neural networks for automated essay grading. https://cs224d.stanford.edu/reports/huyenn.pdf
Pennington, J., Socher, R., & Manning, C. D. (2014). Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (pp. 1532-1543). https://doi.org/10.3115/v1/D14-1162
Phoophuangpairoj, R., & Pipattarasakul, P. (2022). Preliminary indicators of EFL essay writing for teachers’ feedback using automatic text analysis. International Journal of Educational Methodology, 8(1), 55-68. https://doi.org/10.12973/ijem.8.1.55
Powers, D. E., Burstein, J. C., Chodorow, M., Fowles, M. E., & Kukich, K. (2002). Stumping e-rater: Challenging the validity of automated essay scoring. Computers in Human Behavior, 18(2), 103-134. https://doi.org/10.1016/S0747-5632(01)00052-8
Ramesh, D., & Sanampudi, S.K. (2022). An automated essay scoring systems: A systematic literature review. Artificial Intelligence Review, 55, 2495-2527. https://doi.org/10.1007/s10462-021-10068-2
Ramnarain-Seetohul, V., Bassoo, V., & Rosunally, Y. (2022). Similarity measures in automated essay scoring systems: A ten-year review. Education and Information Technologies, 27(4), 5573-5604. https://doi.org/10.1007/s10639-021-10838-z
Roy, S., Dandapat, S., Nagesh, A., & Narahari, Y. (2016). Wisdom of students: A consistent automatic short answer grading technique. NLP Association of India.
Rudner, L. M., Garcia, V., & Welch, C. (2006). An evaluation of IntelliMetric™ essay scoring system. Journal of Technology, Learning and Assessment, 4(4).
Rudner, L. M., & Liang, T. (2002). Automated essay scoring using Bayes' theorem. The Journal of Technology, Learning and Assessment, 1(2). https://ejournals.bc.edu/index.php/jtla/article/view/1668
Rupp, A. A., Casabianca, J. M., Krüger, M., Keller, S., & Köller, O. (2019). Automated essay scoring at scale: A case study in Switzerland and Germany. ETS Research Report Series, 1, 1-23. https://doi.org/10.1002/ets2.12249
Shaker, A., & Hüllermeier, E. (2012). IBLStreams: A system for instance-based classification and regression on data streams. Evolving Systems, 3, 235-249. https://doi.org/10.1007/s12530-012-9059-0
Shermis, M. D., & Hamner, B. (2013). Contrasting state-of-the-art automated scoring of essays. In M. D. Shermis, & J. Burstein (Eds.), Handbook of automated essay evaluation (pp. 335-368). Routledge. https://doi.org/10.4324/9780203122761-20
Shermis, M. D., Mzumara, H. R., Olson, J., & Harrington, S. (2001). On-line grading of student essays: PEG goes on the world wide web. Assessment & Evaluation in Higher Education, 26(3), 247-259. https://doi.org/10.1080/02602930120052404
Srisawat, C., & Poonpon, K. (2023). Revision of an academic English writing rubric for a graduate school admission test. PASAA, 65, 234-262.
Steimel, K., & Riordan, B. (2020). Towards instance-based content scoring with pre-trained transformer models. In Proceedings of the 34^th AAAI Conference on Artificial Intelligence.
Taghipour, K., & Ng, H. T. (2016). A neural approach to automated essay scoring. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (pp. 1882-1891). https://doi.org/10.18653/v1/D16-1193
Tashu, T. M., & Horváth, T. (2020). Smart score-short answer scoring made easy using Sem-LSH. In Proceedings of the 14^th International Conference on Semantic Computing (pp. 145-149). IEEE. https://doi.org/10.1109/ICSC.2020.00028
Uto, M., Xie, Y., & Ueno, M. (2020). Neural automated essay scoring incorporating handcrafted features. In Proceedings of the 28^th International Conference on Computational Linguistics (pp. 6077-6088). https://doi.org/10.18653/v1/2020.coling-main.535
Viera, A. J., & Garrett, J. M. (2005). Understanding interobserver agreement: The kappa statistic. Family Medicine, 37(5), 360-363.
Weigle, S. C. (2002). Assessing writing. Cambridge University Press.https://doi.org/10.1017/CBO9780511732997
Williamson, D. M., Xi, X., & Breyer, F. J. (2012). A framework for evaluation and use of automated scoring. Educational Measurement Issues and Practices, 31(1), 2-13. https://doi.org/10.1111/j.1745-3992.2011.00223.x
Wolfe, E. W., Song, T., & Jiao, H. (2016). Features of difficult-to-score essays. Assessing Writing, 27, 1-10. https://doi.org/10.1016/j.asw.2015.06.002
Yang, R., Cao, J., Wen, Z., Wu, Y., & He, X. (2020). Enhancing automated essay scoring performance via fine-tuning pre-trained language models with combination of regression and ranking. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2020 (pp. 1560-1569). https://doi.org/10.18653/v1/2020.findings-emnlp.141
Yang, Z., Yu, Y., You, C., Steinhardt, J., & Ma, Y. (2020). Rethinking bias-variance trade-off for generalization of neural networks. In Proceedings of the International Conference on Machine Learning (pp. 10767-10777). PMLR.
Zechner, K., Yoon, S., Bhat, S., Leong, C. W. (2017). Comparative evaluation of automated scoring of syntactic competence of non-native speakers. Computers in Human Behavior, 76, 672-682. https://doi.org/10.1016/j.chb.2017.01.060
Zhao, Z., Liu, T., Li, S., Li, B., & Du, X. (2017). Ngram2vec: Learning improved word representations from ngram co-occurrence statistics. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (pp. 244-253). https://doi.org/10.18653/v1/D17-1023
Zhuang, F., Qi, Z., Duan, K., Xi, D., Zhu, Y., Zhu, H., Xiong, H., & He, Q. (2020). A comprehensive survey on transfer learning. Proceedings of the IEEE, 109(1), 43-76. https://doi.org/10.1109/JPROC.2020.3004555
Zupanc, K., & Bosnić, Z. (2020). Improvement of automated essay grading by grouping similar graders. Fundamenta Informaticae [Fundamentals of Informatics], 172(3), 239-259. https://doi.org/10.3233/FI-2020-1904

Journal Details

Founded In: 2010

Published: Quarterly

Language: English

APC: €1350

Indexed in ESCI & SCOPUS

Impact Factor: XX

JCI : XX

CiteScore 2023 : 6.2

Submit Now

[1] Ajay, H. B. (1973). Strategies for content analysis of essays by computer. University of Connecticut. https://search.proquest.com/openview/739b97ecbfd94af0356f4da011575ef8/1?pq-origsite=gscholar&cbl=18750&diss=y

[2] Alikaniotis, D., Yannakoudakis, H., & Rei, M. (2016). Automatic text scoring using neural networks. arXiv, 1606, 04289. https://doi.org/10.18653/v1/P16-1068

[3] Attali, Y., & Burstein, J. (2006). Automated essay scoring with e-rater^® V.2.0. Journal of Technology, Learning, and Assessment, 4(3), i-21. https://doi.org/10.1002/j.2333-8504.2004.tb01972.x

[4] Beseiso, M., & Alzahrani, S. (2020). An empirical analysis of BERT embedding for automated essay scoring. International Journal of Advanced Computer Science and Applications, 11(10), 204-210. https://doi.org/10.14569/IJACSA.2020.0111027

[5] Chen, Z., & Zhou, Y. (2019). Research on automatic essay scoring of composition based on CNN and OR. In Proceedings of the 2^nd International Conference on Artificial Intelligence and Big Data (pp. 13-18). IEEE. https://doi.org/10.1109/ICAIBD.2019.8837007

[6] Cozma, M., Butnaru, A. M., & Lonescu, R. T. (2018). Automated essay scoring with string kernels and word embeddings. arXiv, 1804, 07954. https://doi.org/10.18653/v1/P18-2080

[7] Doewes, A., Kurdhi, N., & Saxena, A. (2023). Evaluating quadratic weighted kappa as the standard performance metric for automated essay scoring. In Proceedings of the 16^th International Conference on Educational Data Mining (pp. 103-113). International Educational Data Mining Society.

[8] Dong, F., Zhang, Y., & Yang, J. (2017). Attention-based recurrent convolutional neural network for automatic essay scoring. In Proceedings of the 21^st Conference on Computational Natural Language Learning (pp. 153-162). https://doi.org/10.18653/v1/K17-1017

[9] Driessens, K., & Džeroski, S. (2005). Combining model-based and instance-based learning for first order regression. In Proceedings of the 22^nd International Conference on Machine Learning (pp. 193-200). https://doi.org/10.1145/1102351.1102376

[10] Evanini, K., Hauck, M., Wang, X., & Blanchard, D. (2015). Automated scoring for the TOEFL junior® comprehensive writing and speaking test. ETS Research Report Series, 1, 1-11. https://doi.org/10.1002/ets2.12052

[11] Firoozi, T., Bulut, O., Epp, C. D., Naeimabadi, A., & Barbosa, D. (2022). The effect of fine-tuned word embedding techniques on the accuracy of automated essay scoring systems using neural networks. Journal of Applied Testing Technology, 23, 21-29.

[12] Foltz, P. W., Laham, D., & Landauer, T. K. (1999). The intelligent essay assessor: Applications to educational technology. Interactive Multimedia Electronic Journal of Computer-Enhanced Learning, 1(2), 939-944. https://doi.org/10.1201/9781351264808-1

[13] Foltz, P. W., Yan, D., & Rupp, A. A. (2020). The past, present, and future of automated scoring for complex tasks. In D. Yan, A. A. Rupp, & P. W. Foltz (Eds.), Handbook of automated scoring: Theory into practice (pp. 1-11). CRC Press.

[14] Ghosh, S., & Fatima, S. (2010). Design of an automated essay grading (AEG) system in Indian context. International Journal of Computer Applications, 1(11), 60-65. https://doi.org/10.5120/237-391

[15] Graves, A., & Graves, A. (2012). Supervised sequence labelling. Springer. https://doi.org/10.1007/978-3-642-24797-2_2

[16] Haberman, S. J. (2011). Use of e-rater® in scoring of the TOEFL iBT® writing test. ETS Research Report Series, 2, 1-13. https://doi.org/10.1002/j.2333-8504.2011.tb02261.x

[17] He, Y., Jiang, F., Chu, X., & Li, P. (2022). Automated Chinese essay scoring from multiple traits. In Proceedings of the 29^th International Conference on Computational Linguistics (pp. 3007-3016).

[18] Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735-1780. https://doi.org/10.1162/neco.1997.9.8.1735

[19] Horbach, A., & Zesch, T. (2019). The influence of variance in learner answers on automatic content scoring. Frontiers in Education, 4, 28. https://doi.org/10.3389/feduc.2019.00028

[20] Huang, X., Sun, J., & Sun, J. (2018). A car-following model considering asymmetric driving behavior based on long short-term memory neural networks. Transportation Research Part C: Emerging Technologies, 95, 346-362. https://doi.org/10.1016/j.trc.2018.07.022

[21] Hussein, M. A., Hassan, H., & Nassef, M. (2019). Automated language essay scoring systems: A literature review. PeerJ Computer Science, 5, e208. https://doi.org/10.7717/peerj-cs.208

[22] Kaggle. (2012). The Hewlett Foundation: Automated essay scoring. https://www.kaggle.com/c/asap-aes/overview/evaluation

[23] Kulkarni, C., Socher, R., Bernstein, M. S., & Klemmer, S. R. (2014). Scaling short-answer grading by combining peer assessment with algorithmic scoring. In Proceedings of the 1^st ACM Conference on Learning @ Scale Conference (pp. 99-108). ACM. https://doi.org/10.1145/2556325.2566238

[24] Lahitani, A. R., Permanasari, A. E., & Setiawan, N. A. (2016, April). Cosine similarity to determine similarity measure: Study case in online essay assessment. In Proceedings of the 4^th International Conference on Cyber and IT Service Management (pp. 1-6). IEEE. https://doi.org/10.1109/CITSM.2016.7577578

[25] Li, X., Chen, M., Nie, J., Liu, Z., Feng, Z., & Cai, Y. (2018). Coherence-based automated essay scoring using self-attention. In M. Sun, T. Liu, X. Wang, Z. Liu, & Y. Liu (Eds.), Chinese computational linguistics and natural language processing based on naturally annotated big data (pp. 386-397). Springer. https://doi.org/10.1007/978-3-030-01716-3_32

[26] Liang, G., On, B. W., Jeong, D., Kim, H. C., & Choi, G. S. (2018). Automated essay scoring: A Siamese bidirectional LSTM neural network architecture. Symmetry, 10(12), 682. https://doi.org/10.3390/sym10120682

[27] Mathias, S., & Bhattacharyya, P. (2018). ASAP++: Enriching ASAP automated essay grading dataset with essay attribute scores. In Proceedings of the 11^th International Conference on Language Resources and Evaluation.

[28] Mathias, S., & Bhattacharyya, P. (2018). Thank “goodness”! A way to measure style in student essays. In Proceedings of the 5^th Workshop on Natural Language Processing Techniques for Educational Applications (pp. 35-41). https://doi.org/10.18653/v1/W18-3705

[29] Mathias, S., & Bhattacharyya, P. (2020). Can neural networks automatically score essay traits? In Proceedings of the 15^th Workshop on Innovative Use of NLP for Building Educational Applications (pp. 85-91). https://doi.org/10.18653/v1/2020.bea-1.8

[30] Mesgar, M., & Strube, M. (2018). A neural local coherence model for text quality assessment. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (pp. 4328-4339). Association for Computational Linguistics. https://doi.org/10.18653/v1/D18-1464

[31] Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. arXiv, 1310, 4546. https://doi.org/10.48550/arXiv.1310.4546

[32] Munir, H., Wnuk, K., & Runeson, P. (2016). Open innovation in software engineering: A systematic mapping study. Empirical Software Engineering, 21, 684-723. https://doi.org/10.1007/s10664-015-9380-x

[33] Nguyen, H., & Dery, L. (2016). Neural networks for automated essay grading. https://cs224d.stanford.edu/reports/huyenn.pdf

[34] Pennington, J., Socher, R., & Manning, C. D. (2014). Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (pp. 1532-1543). https://doi.org/10.3115/v1/D14-1162

[35] Phoophuangpairoj, R., & Pipattarasakul, P. (2022). Preliminary indicators of EFL essay writing for teachers’ feedback using automatic text analysis. International Journal of Educational Methodology, 8(1), 55-68. https://doi.org/10.12973/ijem.8.1.55

[36] Powers, D. E., Burstein, J. C., Chodorow, M., Fowles, M. E., & Kukich, K. (2002). Stumping e-rater: Challenging the validity of automated essay scoring. Computers in Human Behavior, 18(2), 103-134. https://doi.org/10.1016/S0747-5632(01)00052-8

[37] Ramesh, D., & Sanampudi, S.K. (2022). An automated essay scoring systems: A systematic literature review. Artificial Intelligence Review, 55, 2495-2527. https://doi.org/10.1007/s10462-021-10068-2

[38] Ramnarain-Seetohul, V., Bassoo, V., & Rosunally, Y. (2022). Similarity measures in automated essay scoring systems: A ten-year review. Education and Information Technologies, 27(4), 5573-5604. https://doi.org/10.1007/s10639-021-10838-z

[39] Roy, S., Dandapat, S., Nagesh, A., & Narahari, Y. (2016). Wisdom of students: A consistent automatic short answer grading technique. NLP Association of India.

[40] Rudner, L. M., Garcia, V., & Welch, C. (2006). An evaluation of IntelliMetric™ essay scoring system. Journal of Technology, Learning and Assessment, 4(4).

[41] Rudner, L. M., & Liang, T. (2002). Automated essay scoring using Bayes' theorem. The Journal of Technology, Learning and Assessment, 1(2). https://ejournals.bc.edu/index.php/jtla/article/view/1668

[42] Rupp, A. A., Casabianca, J. M., Krüger, M., Keller, S., & Köller, O. (2019). Automated essay scoring at scale: A case study in Switzerland and Germany. ETS Research Report Series, 1, 1-23. https://doi.org/10.1002/ets2.12249

[43] Shaker, A., & Hüllermeier, E. (2012). IBLStreams: A system for instance-based classification and regression on data streams. Evolving Systems, 3, 235-249. https://doi.org/10.1007/s12530-012-9059-0

[44] Shermis, M. D., & Hamner, B. (2013). Contrasting state-of-the-art automated scoring of essays. In M. D. Shermis, & J. Burstein (Eds.), Handbook of automated essay evaluation (pp. 335-368). Routledge. https://doi.org/10.4324/9780203122761-20

[45] Shermis, M. D., Mzumara, H. R., Olson, J., & Harrington, S. (2001). On-line grading of student essays: PEG goes on the world wide web. Assessment & Evaluation in Higher Education, 26(3), 247-259. https://doi.org/10.1080/02602930120052404

[46] Srisawat, C., & Poonpon, K. (2023). Revision of an academic English writing rubric for a graduate school admission test. PASAA, 65, 234-262.

[47] Steimel, K., & Riordan, B. (2020). Towards instance-based content scoring with pre-trained transformer models. In Proceedings of the 34^th AAAI Conference on Artificial Intelligence.

[48] Taghipour, K., & Ng, H. T. (2016). A neural approach to automated essay scoring. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (pp. 1882-1891). https://doi.org/10.18653/v1/D16-1193

[49] Tashu, T. M., & Horváth, T. (2020). Smart score-short answer scoring made easy using Sem-LSH. In Proceedings of the 14^th International Conference on Semantic Computing (pp. 145-149). IEEE. https://doi.org/10.1109/ICSC.2020.00028

[50] Uto, M., Xie, Y., & Ueno, M. (2020). Neural automated essay scoring incorporating handcrafted features. In Proceedings of the 28^th International Conference on Computational Linguistics (pp. 6077-6088). https://doi.org/10.18653/v1/2020.coling-main.535

[51] Viera, A. J., & Garrett, J. M. (2005). Understanding interobserver agreement: The kappa statistic. Family Medicine, 37(5), 360-363.

[52] Weigle, S. C. (2002). Assessing writing. Cambridge University Press.https://doi.org/10.1017/CBO9780511732997

[53] Williamson, D. M., Xi, X., & Breyer, F. J. (2012). A framework for evaluation and use of automated scoring. Educational Measurement Issues and Practices, 31(1), 2-13. https://doi.org/10.1111/j.1745-3992.2011.00223.x

[54] Wolfe, E. W., Song, T., & Jiao, H. (2016). Features of difficult-to-score essays. Assessing Writing, 27, 1-10. https://doi.org/10.1016/j.asw.2015.06.002

[55] Yang, R., Cao, J., Wen, Z., Wu, Y., & He, X. (2020). Enhancing automated essay scoring performance via fine-tuning pre-trained language models with combination of regression and ranking. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2020 (pp. 1560-1569). https://doi.org/10.18653/v1/2020.findings-emnlp.141

[56] Yang, Z., Yu, Y., You, C., Steinhardt, J., & Ma, Y. (2020). Rethinking bias-variance trade-off for generalization of neural networks. In Proceedings of the International Conference on Machine Learning (pp. 10767-10777). PMLR.

[57] Zechner, K., Yoon, S., Bhat, S., Leong, C. W. (2017). Comparative evaluation of automated scoring of syntactic competence of non-native speakers. Computers in Human Behavior, 76, 672-682. https://doi.org/10.1016/j.chb.2017.01.060

[58] Zhao, Z., Liu, T., Li, S., Li, B., & Du, X. (2017). Ngram2vec: Learning improved word representations from ngram co-occurrence statistics. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (pp. 244-253). https://doi.org/10.18653/v1/D17-1023

[59] Zhuang, F., Qi, Z., Duan, K., Xi, D., Zhu, Y., Zhu, H., Xiong, H., & He, Q. (2020). A comprehensive survey on transfer learning. Proceedings of the IEEE, 109(1), 43-76. https://doi.org/10.1109/JPROC.2020.3004555

[60] Zupanc, K., & Bosnić, Z. (2020). Improvement of automated essay grading by grouping similar graders. Fundamenta Informaticae [Fundamentals of Informatics], 172(3), 239-259. https://doi.org/10.3233/FI-2020-1904