Educational Evaluation with Large Language Models (LLMs): ChatGPT-4 in Recalling and Evaluating Students’ Written Responses

dc.contributor.authorJauhiainen, Jussi S.
dc.contributor.authorGaragorry Guerra, Agustín
dc.contributor.organizationfi=maantiede|en=Geography |
dc.contributor.organization-code1.2.246.10.2458963.20.17647764921
dc.converis.publication-id491502817
dc.converis.urlhttps://research.utu.fi/converis/portal/Publication/491502817
dc.date.accessioned2025-08-28T01:38:05Z
dc.date.available2025-08-28T01:38:05Z
dc.description.abstract<p><b>Aim/Purpose</b><br></p><p>This article investigates the process of identifying and correcting hallucinations in ChatGPT-4’s recall of student-written responses as well as its evaluation of these responses, and provision of feedback. Effective prompting is examined to enhance the pre-evaluation, evaluation, and post-evaluation stages.</p><p><b>Background</b><br></p><p>Advanced Large Language Models (LLMs), such as ChatGPT-4, have gained significant traction in educational contexts. However, as of early 2025, systematic empirical studies on their application for evaluating students’ essays and open-ended written exam responses remain limited. It is important to consider pre-evaluation, evaluation and post-evaluation stages when using LLMs.</p><p><b>Methodology</b><br></p><p>In this study, ChatGPT-4 recalled 10 times 54 open-ended responses submitted by university students, making together almost 50,000 words, and assessing and offering feedback on each response.</p><p><b>Contribution</b><br></p><p>The findings emphasize the critical importance of pre-evaluation, evaluation, and post-evaluation stages, and in particular prompting and recalling when utilizing LLMs for educational assessments.</p><p><b>Findings</b><br></p><p>Using systematic prompting techniques, such as Chain of Thought (CoT), ChatGPT-4 can be effectively prepared to accurately recall, evaluate, and provide meaningful, individualized feedback on students’ written responses, following specific instructional guidelines.</p><p><b>Recommendations for Practitioners</b><br></p><p>Proper implementation of pre-evaluation, evaluation and post-evaluation stages and testing of recall accuracy are important when using ChatGPT-4 for evaluating students’ open-ended responses and providing feedback.</p><p><b>Recommendation for Researchers</b><br></p><p>Recall accuracy needs to be tested, and the prompting process carefully revealed when using and researching LLMs like ChatGPT-4 for educational evaluations.</p><p><b>Impact on Society</b><br></p><p>As LLMs continue to evolve, they are expected to become valuable tools for assessing student essays and open-ended responses, offering potential time and resource savings for educators and educational institutions.</p><p><b>Future Research</b><br></p><p>Future research should explore the use of various LLMs across different academic fields and topics to better understand their potential and limitations in educational evaluation.</p>
dc.identifier.eissn2165-316X
dc.identifier.jour-issn2165-3151
dc.identifier.olddbid207822
dc.identifier.oldhandle10024/190849
dc.identifier.urihttps://www.utupub.fi/handle/11111/57285
dc.identifier.urlhttps://doi.org/10.28945/5433
dc.identifier.urnURN:NBN:fi-fe2025082791775
dc.language.isoen
dc.okm.affiliatedauthorJauhiainen, Jussi
dc.okm.affiliatedauthorGaragorry Guerra, Agustín
dc.okm.discipline113 Computer and information sciencesen_GB
dc.okm.discipline1171 Geosciencesen_GB
dc.okm.discipline516 Educational sciencesen_GB
dc.okm.discipline519 Social and economic geographyen_GB
dc.okm.discipline113 Tietojenkäsittely ja informaatiotieteetfi_FI
dc.okm.discipline1171 Geotieteetfi_FI
dc.okm.discipline516 Kasvatustieteetfi_FI
dc.okm.discipline519 Yhteiskuntamaantiede, talousmaantiedefi_FI
dc.okm.internationalcopublicationinternational co-publication
dc.okm.internationalityInternational publication
dc.okm.typeA1 ScientificArticle
dc.publisherInforming Science Institute
dc.publisher.countryUnited Statesen_GB
dc.publisher.countryYhdysvallat (USA)fi_FI
dc.publisher.country-codeUS
dc.relation.articlenumber2
dc.relation.doi10.28945/5433
dc.relation.ispartofjournalJournal of Information Technology Education: Innovations in Practice
dc.relation.volume24
dc.source.identifierhttps://www.utupub.fi/handle/10024/190849
dc.titleEducational Evaluation with Large Language Models (LLMs): ChatGPT-4 in Recalling and Evaluating Students’ Written Responses
dc.year.issued2025

Tiedostot

Näytetään 1 - 1 / 1
Ladataan...
Name:
JITE-IIPv24Art002Jauhiainen11203.pdf
Size:
868.17 KB
Format:
Adobe Portable Document Format