GlotEval: A Test Suite for Massively Multilingual Evaluation of Large Language Models

Luo, Hengyu; Li, Zihao; Attieh, Joseph; Devkota, Sawal; de Gibert, Ona; Huang, Xu; Ji, Shaoxiong; Lin, Peiqin; Mantina; Bhavani Sai Praneeth Varma; Sreenidhi, Ananda; Vázquez, Raúl; Wang, Mengjie; Yusofi, Samea; Yuan, Fei; Tiedemann, Jörg

GlotEval: A Test Suite for Massively Multilingual Evaluation of Large Language Models

dc.contributor.author	Luo, Hengyu
dc.contributor.author	Li, Zihao
dc.contributor.author	Attieh, Joseph
dc.contributor.author	Devkota, Sawal
dc.contributor.author	de Gibert, Ona
dc.contributor.author	Huang, Xu
dc.contributor.author	Ji, Shaoxiong
dc.contributor.author	Lin, Peiqin
dc.contributor.author	Mantina
dc.contributor.author	Bhavani Sai Praneeth Varma
dc.contributor.author	Sreenidhi, Ananda
dc.contributor.author	Vázquez, Raúl
dc.contributor.author	Wang, Mengjie
dc.contributor.author	Yusofi, Samea
dc.contributor.author	Yuan, Fei
dc.contributor.author	Tiedemann, Jörg
dc.contributor.organization	fi=data-analytiikka\|en=Data-analytiikka\|
dc.contributor.organization-code	1.2.246.10.2458963.20.68940835793
dc.converis.publication-id	506505289
dc.converis.url	https://research.utu.fi/converis/portal/Publication/506505289
dc.date.accessioned	2026-01-21T14:53:11Z
dc.date.available	2026-01-21T14:53:11Z
dc.description.abstract	<p>Large language models (LLMs) are advancing at an unprecedented pace globally, with regions increasingly adopting these models for applications in their primary languages. Evaluating these models in diverse linguistic environments, especially in low-resource languages, has become a major challenge for academia and industry. Existing evaluation frameworks suffer from inconsistency across different benchmarks, being disproportionately focused on English and a handful of high-resource languages, thereby overlooking the realistic performance of LLMs in multilingual and lower-resource scenarios. To address this critical challenge of fragmented and inconsistent multilingual evaluation, we introduce GlotEval, a unified and lightweight framework that systematically integrates 27 benchmarks under a standardized ISO 639-3 language identifier system, allowing for seamless incorporation of new benchmarks. Supporting nine key tasks (machine translation, text classification, summarization, open-ended generation, reading comprehension, sequence labeling, intrinsic evaluation, instruction following and reasoning), spanning over dozens to hundreds of languages, GlotEval uniquely enables language-specific, cross-benchmark analysis and non-English-centric evaluations at a scale previously less practical for many researchers. This enables a precise diagnosis of model strengths and weaknesses in diverse linguistic contexts. A multilingual translation case study demonstrates GlotEval’s applicability for multilingual and language-specific evaluations.</p>
dc.format.pagerange	614
dc.identifier.isbn	979-8-89176-334-0
dc.identifier.olddbid	213835
dc.identifier.oldhandle	10024/196853
dc.identifier.uri	https://www.utupub.fi/handle/11111/56002
dc.identifier.url	https://aclanthology.org/2025.emnlp-demos.43/
dc.identifier.urn	URN:NBN:fi-fe202601216057
dc.language.iso	en
dc.okm.affiliatedauthor	Ji, Shaoxiong
dc.okm.discipline	113 Computer and information sciences	en_GB
dc.okm.discipline	113 Tietojenkäsittely ja informaatiotieteet	fi_FI
dc.okm.internationalcopublication	international co-publication
dc.okm.internationality	International publication
dc.okm.type	A4 Conference Article
dc.publisher.country	United States	en_GB
dc.publisher.country	Yhdysvallat (USA)	fi_FI
dc.publisher.country-code	US
dc.relation.conference	Empirical Methods in Natural Language Processing
dc.relation.doi	10.18653/v1/2025.emnlp-demos.43
dc.source.identifier	https://www.utupub.fi/handle/10024/196853
dc.title	GlotEval: A Test Suite for Massively Multilingual Evaluation of Large Language Models
dc.title.book	Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing : System Demonstrations
dc.year.issued	2025

Tiedostot

Näytetään 1 - 1 / 1

Name:: luo_etal_2025.pdf
Size:: 736.83 KB
Format:: Adobe Portable Document Format

Lataa

Kokoelmat

Rinnakkaistallenteet