Educational Evaluation with Large Language Models (LLMs): ChatGPT-4 in Recalling and Evaluating Students’ Written Responses

Jauhiainen, Jussi S.; Garagorry Guerra, Agustín

Educational Evaluation with Large Language Models (LLMs): ChatGPT-4 in Recalling and Evaluating Students’ Written Responses

dc.contributor.author	Jauhiainen, Jussi S.
dc.contributor.author	Garagorry Guerra, Agustín
dc.contributor.organization	fi=maantiede\|en=Geography \|
dc.contributor.organization-code	1.2.246.10.2458963.20.17647764921
dc.converis.publication-id	491502817
dc.converis.url	https://research.utu.fi/converis/portal/Publication/491502817
dc.date.accessioned	2025-08-28T01:38:05Z
dc.date.available	2025-08-28T01:38:05Z
dc.description.abstract	<p><b>Aim/Purpose</b><br></p><p>This article investigates the process of identifying and correcting hallucinations in ChatGPT-4’s recall of student-written responses as well as its evaluation of these responses, and provision of feedback. Effective prompting is examined to enhance the pre-evaluation, evaluation, and post-evaluation stages.</p><p><b>Background</b><br></p><p>Advanced Large Language Models (LLMs), such as ChatGPT-4, have gained significant traction in educational contexts. However, as of early 2025, systematic empirical studies on their application for evaluating students’ essays and open-ended written exam responses remain limited. It is important to consider pre-evaluation, evaluation and post-evaluation stages when using LLMs.</p><p><b>Methodology</b><br></p><p>In this study, ChatGPT-4 recalled 10 times 54 open-ended responses submitted by university students, making together almost 50,000 words, and assessing and offering feedback on each response.</p><p><b>Contribution</b><br></p><p>The findings emphasize the critical importance of pre-evaluation, evaluation, and post-evaluation stages, and in particular prompting and recalling when utilizing LLMs for educational assessments.</p><p><b>Findings</b><br></p><p>Using systematic prompting techniques, such as Chain of Thought (CoT), ChatGPT-4 can be effectively prepared to accurately recall, evaluate, and provide meaningful, individualized feedback on students’ written responses, following specific instructional guidelines.</p><p><b>Recommendations for Practitioners</b><br></p><p>Proper implementation of pre-evaluation, evaluation and post-evaluation stages and testing of recall accuracy are important when using ChatGPT-4 for evaluating students’ open-ended responses and providing feedback.</p><p><b>Recommendation for Researchers</b><br></p><p>Recall accuracy needs to be tested, and the prompting process carefully revealed when using and researching LLMs like ChatGPT-4 for educational evaluations.</p><p><b>Impact on Society</b><br></p><p>As LLMs continue to evolve, they are expected to become valuable tools for assessing student essays and open-ended responses, offering potential time and resource savings for educators and educational institutions.</p><p><b>Future Research</b><br></p><p>Future research should explore the use of various LLMs across different academic fields and topics to better understand their potential and limitations in educational evaluation.</p>
dc.identifier.eissn	2165-316X
dc.identifier.jour-issn	2165-3151
dc.identifier.olddbid	207822
dc.identifier.oldhandle	10024/190849
dc.identifier.uri	https://www.utupub.fi/handle/11111/57285
dc.identifier.url	https://doi.org/10.28945/5433
dc.identifier.urn	URN:NBN:fi-fe2025082791775
dc.language.iso	en
dc.okm.affiliatedauthor	Jauhiainen, Jussi
dc.okm.affiliatedauthor	Garagorry Guerra, Agustín
dc.okm.discipline	113 Computer and information sciences	en_GB
dc.okm.internationalcopublication	international co-publication
dc.okm.internationality	International publication
dc.okm.type	A1 ScientificArticle
dc.publisher	Informing Science Institute
dc.publisher.country	United States	en_GB
dc.publisher.country	Yhdysvallat (USA)	fi_FI
dc.publisher.country-code	US
dc.relation.articlenumber	2
dc.relation.doi	10.28945/5433
dc.relation.ispartofjournal	Journal of Information Technology Education: Innovations in Practice
dc.relation.volume	24
dc.source.identifier	https://www.utupub.fi/handle/10024/190849
dc.title	Educational Evaluation with Large Language Models (LLMs): ChatGPT-4 in Recalling and Evaluating Students’ Written Responses
dc.year.issued	2025

Tiedostot

Näytetään 1 - 1 / 1

Name:: JITE-IIPv24Art002Jauhiainen11203.pdf
Size:: 868.17 KB
Format:: Adobe Portable Document Format

Lataa

Kokoelmat

Rinnakkaistallenteet