Evaluating Students’ Open-ended Written Responses with LLMs: Using the RAG Framework for GPT-3.5, GPT-4, Claude-3, and Mistral-Large

dc.contributor.authorJauhiainen, Jussi
dc.contributor.authorGaragorry Guerra, Agustín
dc.contributor.organizationfi=maantiede|en=Geography |
dc.contributor.organization-code1.2.246.10.2458963.20.17647764921
dc.converis.publication-id477835432
dc.converis.urlhttps://research.utu.fi/converis/portal/Publication/477835432
dc.date.accessioned2025-08-27T22:19:14Z
dc.date.available2025-08-27T22:19:14Z
dc.description.abstract<p>Evaluating open-ended written examination responses from students is an essential yet time-intensive task for educators, requiring a high degree of effort, consistency, and precision. Recent developments in Large Language Models (LLMs) present a promising opportunity to balance the need for thorough evaluation with efficient use of educators' time. We explore LLMs—GPT-3.5, GPT-4, Claude-3, and Mistral-Large—in assessing university students' open-ended responses to questions about reference material they have studied. Each model was instructed to evaluate 54 responses repeatedly under two conditions: 10 times (10-shot) with a temperature setting of 0.0 and 10 times with a temperature of 0.5, expecting a total of 1,080 evaluations per model and 4,320 evaluations across all models. The RAG (Retrieval Augmented Generation) framework was used to make the LLMs to process the evaluation. Notable variations existed in studied LLMs consistency and the grading outcomes. There is a need to comprehend strengths and weaknesses of using LLMs for educational assessments.</p>
dc.format.pagerange3097
dc.format.pagerange3113
dc.identifier.eissn2582-9793
dc.identifier.olddbid201972
dc.identifier.oldhandle10024/184999
dc.identifier.urihttps://www.utupub.fi/handle/11111/39199
dc.identifier.urlhttps://doi.org/10.54364/aaiml.2024.44177
dc.identifier.urnURN:NBN:fi-fe2025082789630
dc.language.isoen
dc.okm.affiliatedauthorJauhiainen, Jussi
dc.okm.affiliatedauthorGaragorry Guerra, Agustín
dc.okm.discipline519 Social and economic geographyen_GB
dc.okm.discipline519 Yhteiskuntamaantiede, talousmaantiedefi_FI
dc.okm.internationalcopublicationinternational co-publication
dc.okm.internationalityInternational publication
dc.okm.typeA1 ScientificArticle
dc.publisherShimur Publications
dc.publisher.countryIndiaen_GB
dc.publisher.countryIntiafi_FI
dc.publisher.country-codeIN
dc.relation.doi10.54364/AAIML.2024.44177
dc.relation.ispartofjournalAdvances in Artificial Intelligence and Machine Learning
dc.relation.issue4
dc.relation.volume4
dc.source.identifierhttps://www.utupub.fi/handle/10024/184999
dc.titleEvaluating Students’ Open-ended Written Responses with LLMs: Using the RAG Framework for GPT-3.5, GPT-4, Claude-3, and Mistral-Large
dc.year.issued2024

Tiedostot

Näytetään 1 - 1 / 1
Ladataan...
Name:
245944177.pdf
Size:
513.36 KB
Format:
Adobe Portable Document Format