Evaluating Students’ Open-ended Written Responses with LLMs: Using the RAG Framework for GPT-3.5, GPT-4, Claude-3, and Mistral-Large

Jauhiainen, Jussi; Garagorry Guerra, Agustín

Evaluating Students’ Open-ended Written Responses with LLMs: Using the RAG Framework for GPT-3.5, GPT-4, Claude-3, and Mistral-Large

dc.contributor.author	Jauhiainen, Jussi
dc.contributor.author	Garagorry Guerra, Agustín
dc.contributor.organization	fi=maantiede\|en=Geography \|
dc.contributor.organization-code	1.2.246.10.2458963.20.17647764921
dc.converis.publication-id	477835432
dc.converis.url	https://research.utu.fi/converis/portal/Publication/477835432
dc.date.accessioned	2025-08-27T22:19:14Z
dc.date.available	2025-08-27T22:19:14Z
dc.description.abstract	<p>Evaluating open-ended written examination responses from students is an essential yet time-intensive task for educators, requiring a high degree of effort, consistency, and precision. Recent developments in Large Language Models (LLMs) present a promising opportunity to balance the need for thorough evaluation with efficient use of educators' time. We explore LLMs—GPT-3.5, GPT-4, Claude-3, and Mistral-Large—in assessing university students' open-ended responses to questions about reference material they have studied. Each model was instructed to evaluate 54 responses repeatedly under two conditions: 10 times (10-shot) with a temperature setting of 0.0 and 10 times with a temperature of 0.5, expecting a total of 1,080 evaluations per model and 4,320 evaluations across all models. The RAG (Retrieval Augmented Generation) framework was used to make the LLMs to process the evaluation. Notable variations existed in studied LLMs consistency and the grading outcomes. There is a need to comprehend strengths and weaknesses of using LLMs for educational assessments.</p>
dc.format.pagerange	3113
dc.identifier.eissn	2582-9793
dc.identifier.olddbid	201972
dc.identifier.oldhandle	10024/184999
dc.identifier.uri	https://www.utupub.fi/handle/11111/39199
dc.identifier.url	https://doi.org/10.54364/aaiml.2024.44177
dc.identifier.urn	URN:NBN:fi-fe2025082789630
dc.language.iso	en
dc.okm.affiliatedauthor	Jauhiainen, Jussi
dc.okm.affiliatedauthor	Garagorry Guerra, Agustín
dc.okm.discipline	519 Social and economic geography	en_GB
dc.okm.discipline	519 Yhteiskuntamaantiede, talousmaantiede	fi_FI
dc.okm.internationalcopublication	international co-publication
dc.okm.internationality	International publication
dc.okm.type	A1 ScientificArticle
dc.publisher	Shimur Publications
dc.publisher.country	India	en_GB
dc.publisher.country	Intia	fi_FI
dc.publisher.country-code	IN
dc.relation.doi	10.54364/AAIML.2024.44177
dc.relation.ispartofjournal	Advances in Artificial Intelligence and Machine Learning
dc.relation.issue	4
dc.relation.volume	4
dc.source.identifier	https://www.utupub.fi/handle/10024/184999
dc.title	Evaluating Students’ Open-ended Written Responses with LLMs: Using the RAG Framework for GPT-3.5, GPT-4, Claude-3, and Mistral-Large
dc.year.issued	2024

Tiedostot

Näytetään 1 - 1 / 1

Name:: 245944177.pdf
Size:: 513.36 KB
Format:: Adobe Portable Document Format

Lataa

Kokoelmat

Rinnakkaistallenteet