Hyppää sisältöön
    • Suomeksi
    • In English
  • Suomeksi
  • In English
  • Kirjaudu
Näytä aineisto 
  •   Etusivu
  • 3. UTUCris-artikkelit
  • Rinnakkaistallenteet
  • Näytä aineisto
  •   Etusivu
  • 3. UTUCris-artikkelit
  • Rinnakkaistallenteet
  • Näytä aineisto
JavaScript is disabled for your browser. Some features of this site may not work without it.

Generative AI in assessing written responses of geography exams: challenges and potential

Jauhiainen, Jussi S.; Gagagorry Guerra, Agustín; Nylén, Tua; Mäki, Sanna

Generative AI in assessing written responses of geography exams: challenges and potential

Jauhiainen, Jussi S.
Gagagorry Guerra, Agustín
Nylén, Tua
Mäki, Sanna
Katso/Avaa
Generative AI in assessing written responses of geography exams challenges and potential.pdf (1023.Kb)
Lataukset: 

Informa UK Limited
doi:10.1080/03098265.2025.2593484
URI
https://doi.org/10.1080/03098265.2025.2593484
Näytä kaikki kuvailutiedot
Julkaisun pysyvä osoite on:
https://urn.fi/URN:NBN:fi-fe202601215614
Tiivistelmä

This article examines the application of Large Language Models (LLM) – GPT-4, Claude, Cohere, and Llama – to assess students’ open-ended responses in Geography exams. The models’ assessment scores were compared to assessment and scores by the original multi-stage human assessment as well as two additional human expert scoring. The case study considers the high-stakes national matriculation exam in Finland. The exam results play a crucial role in determining individuals’ eligibility for higher education, including a study right in Geography at the university. We selected 18 essays that had originally been given 5 (basic), 10 (good) and 15 (excellent) points on a scale from 0 to 15 points. Findings show variability between LLMs and notable differences between LLM and human evaluations. The language of responses and grading instruction influenced LLM performance. These results highlight the potential and complexities of integrating generative AI today in learning assessments to score open-ended responses. Precise control of prompts and LLM settings proved crucial for the LLM to align with original assessment scores more closely.

Kokoelmat
  • Rinnakkaistallenteet [29337]

Turun yliopiston kirjasto | Turun yliopisto
julkaisut@utu.fi | Tietosuoja | Saavutettavuusseloste
 

 

Tämä kokoelma

JulkaisuajatTekijätNimekkeetAsiasanatTiedekuntaLaitosOppiaineYhteisöt ja kokoelmat

Omat tiedot

Kirjaudu sisäänRekisteröidy

Turun yliopiston kirjasto | Turun yliopisto
julkaisut@utu.fi | Tietosuoja | Saavutettavuusseloste