Educational Evaluation with Large Language Models (LLMs): ChatGPT-4 in Recalling and Evaluating Students’ Written Responses
| dc.contributor.author | Jauhiainen, Jussi S. | |
| dc.contributor.author | Garagorry Guerra, Agustín | |
| dc.contributor.organization | fi=maantiede|en=Geography | | |
| dc.contributor.organization-code | 1.2.246.10.2458963.20.17647764921 | |
| dc.converis.publication-id | 491502817 | |
| dc.converis.url | https://research.utu.fi/converis/portal/Publication/491502817 | |
| dc.date.accessioned | 2025-08-28T01:38:05Z | |
| dc.date.available | 2025-08-28T01:38:05Z | |
| dc.description.abstract | <p><b>Aim/Purpose</b><br></p><p>This article investigates the process of identifying and correcting hallucinations in ChatGPT-4’s recall of student-written responses as well as its evaluation of these responses, and provision of feedback. Effective prompting is examined to enhance the pre-evaluation, evaluation, and post-evaluation stages.</p><p><b>Background</b><br></p><p>Advanced Large Language Models (LLMs), such as ChatGPT-4, have gained significant traction in educational contexts. However, as of early 2025, systematic empirical studies on their application for evaluating students’ essays and open-ended written exam responses remain limited. It is important to consider pre-evaluation, evaluation and post-evaluation stages when using LLMs.</p><p><b>Methodology</b><br></p><p>In this study, ChatGPT-4 recalled 10 times 54 open-ended responses submitted by university students, making together almost 50,000 words, and assessing and offering feedback on each response.</p><p><b>Contribution</b><br></p><p>The findings emphasize the critical importance of pre-evaluation, evaluation, and post-evaluation stages, and in particular prompting and recalling when utilizing LLMs for educational assessments.</p><p><b>Findings</b><br></p><p>Using systematic prompting techniques, such as Chain of Thought (CoT), ChatGPT-4 can be effectively prepared to accurately recall, evaluate, and provide meaningful, individualized feedback on students’ written responses, following specific instructional guidelines.</p><p><b>Recommendations for Practitioners</b><br></p><p>Proper implementation of pre-evaluation, evaluation and post-evaluation stages and testing of recall accuracy are important when using ChatGPT-4 for evaluating students’ open-ended responses and providing feedback.</p><p><b>Recommendation for Researchers</b><br></p><p>Recall accuracy needs to be tested, and the prompting process carefully revealed when using and researching LLMs like ChatGPT-4 for educational evaluations.</p><p><b>Impact on Society</b><br></p><p>As LLMs continue to evolve, they are expected to become valuable tools for assessing student essays and open-ended responses, offering potential time and resource savings for educators and educational institutions.</p><p><b>Future Research</b><br></p><p>Future research should explore the use of various LLMs across different academic fields and topics to better understand their potential and limitations in educational evaluation.</p> | |
| dc.identifier.eissn | 2165-316X | |
| dc.identifier.jour-issn | 2165-3151 | |
| dc.identifier.olddbid | 207822 | |
| dc.identifier.oldhandle | 10024/190849 | |
| dc.identifier.uri | https://www.utupub.fi/handle/11111/57285 | |
| dc.identifier.url | https://doi.org/10.28945/5433 | |
| dc.identifier.urn | URN:NBN:fi-fe2025082791775 | |
| dc.language.iso | en | |
| dc.okm.affiliatedauthor | Jauhiainen, Jussi | |
| dc.okm.affiliatedauthor | Garagorry Guerra, Agustín | |
| dc.okm.discipline | 113 Computer and information sciences | en_GB |
| dc.okm.discipline | 1171 Geosciences | en_GB |
| dc.okm.discipline | 516 Educational sciences | en_GB |
| dc.okm.discipline | 519 Social and economic geography | en_GB |
| dc.okm.discipline | 113 Tietojenkäsittely ja informaatiotieteet | fi_FI |
| dc.okm.discipline | 1171 Geotieteet | fi_FI |
| dc.okm.discipline | 516 Kasvatustieteet | fi_FI |
| dc.okm.discipline | 519 Yhteiskuntamaantiede, talousmaantiede | fi_FI |
| dc.okm.internationalcopublication | international co-publication | |
| dc.okm.internationality | International publication | |
| dc.okm.type | A1 ScientificArticle | |
| dc.publisher | Informing Science Institute | |
| dc.publisher.country | United States | en_GB |
| dc.publisher.country | Yhdysvallat (USA) | fi_FI |
| dc.publisher.country-code | US | |
| dc.relation.articlenumber | 2 | |
| dc.relation.doi | 10.28945/5433 | |
| dc.relation.ispartofjournal | Journal of Information Technology Education: Innovations in Practice | |
| dc.relation.volume | 24 | |
| dc.source.identifier | https://www.utupub.fi/handle/10024/190849 | |
| dc.title | Educational Evaluation with Large Language Models (LLMs): ChatGPT-4 in Recalling and Evaluating Students’ Written Responses | |
| dc.year.issued | 2025 |
Tiedostot
1 - 1 / 1
Ladataan...
- Name:
- JITE-IIPv24Art002Jauhiainen11203.pdf
- Size:
- 868.17 KB
- Format:
- Adobe Portable Document Format