Hyppää sisältöön
    • Suomeksi
    • In English
  • Suomeksi
  • In English
  • Kirjaudu
Näytä aineisto 
  •   Etusivu
  • 3. UTUCris-artikkelit
  • Rinnakkaistallenteet
  • Näytä aineisto
  •   Etusivu
  • 3. UTUCris-artikkelit
  • Rinnakkaistallenteet
  • Näytä aineisto
JavaScript is disabled for your browser. Some features of this site may not work without it.

GlotEval: A Test Suite for Massively Multilingual Evaluation of Large Language Models

Luo, Hengyu; Li, Zihao; Attieh, Joseph; Devkota, Sawal; de Gibert, Ona; Huang, Xu; Ji, Shaoxiong; Lin, Peiqin; Mantina; Bhavani Sai Praneeth Varma; Sreenidhi, Ananda; Vázquez, Raúl; Wang, Mengjie; Yusofi, Samea; Yuan, Fei; Tiedemann, Jörg

GlotEval: A Test Suite for Massively Multilingual Evaluation of Large Language Models

Luo, Hengyu
Li, Zihao
Attieh, Joseph
Devkota, Sawal
de Gibert, Ona
Huang, Xu
Ji, Shaoxiong
Lin, Peiqin
Mantina
Bhavani Sai Praneeth Varma
Sreenidhi, Ananda
Vázquez, Raúl
Wang, Mengjie
Yusofi, Samea
Yuan, Fei
Tiedemann, Jörg
Katso/Avaa
2025.emnlp-demos.43.pdf (658.0Kb)
Lataukset: 

doi:10.18653/v1/2025.emnlp-demos.43
URI
https://aclanthology.org/2025.emnlp-demos.43/
Näytä kaikki kuvailutiedot
Julkaisun pysyvä osoite on:
https://urn.fi/URN:NBN:fi-fe202601216057
Tiivistelmä

Large language models (LLMs) are advancing at an unprecedented pace globally, with regions increasingly adopting these models for applications in their primary languages. Evaluating these models in diverse linguistic environments, especially in low-resource languages, has become a major challenge for academia and industry. Existing evaluation frameworks suffer from inconsistency across different benchmarks, being disproportionately focused on English and a handful of high-resource languages, thereby overlooking the realistic performance of LLMs in multilingual and lower-resource scenarios. To address this critical challenge of fragmented and inconsistent multilingual evaluation, we introduce GlotEval, a unified and lightweight framework that systematically integrates 27 benchmarks under a standardized ISO 639-3 language identifier system, allowing for seamless incorporation of new benchmarks. Supporting nine key tasks (machine translation, text classification, summarization, open-ended generation, reading comprehension, sequence labeling, intrinsic evaluation, instruction following and reasoning), spanning over dozens to hundreds of languages, GlotEval uniquely enables language-specific, cross-benchmark analysis and non-English-centric evaluations at a scale previously less practical for many researchers. This enables a precise diagnosis of model strengths and weaknesses in diverse linguistic contexts. A multilingual translation case study demonstrates GlotEval’s applicability for multilingual and language-specific evaluations.

Kokoelmat
  • Rinnakkaistallenteet [29335]

Turun yliopiston kirjasto | Turun yliopisto
julkaisut@utu.fi | Tietosuoja | Saavutettavuusseloste
 

 

Tämä kokoelma

JulkaisuajatTekijätNimekkeetAsiasanatTiedekuntaLaitosOppiaineYhteisöt ja kokoelmat

Omat tiedot

Kirjaudu sisäänRekisteröidy

Turun yliopiston kirjasto | Turun yliopisto
julkaisut@utu.fi | Tietosuoja | Saavutettavuusseloste