TULUN: Transparent and Adaptable Low-resource Machine Translation

Merx, Raphael; Suominen, Hanna; Hong, Lois Yinghui; Thieberger, Nick; Cohn, Trevor; Vylomova, Ekaterina

TULUN: Transparent and Adaptable Low-resource Machine Translation

dc.contributor.author	Merx, Raphael
dc.contributor.author	Suominen, Hanna
dc.contributor.author	Hong, Lois Yinghui
dc.contributor.author	Thieberger, Nick
dc.contributor.author	Cohn, Trevor
dc.contributor.author	Vylomova, Ekaterina
dc.contributor.organization	fi=tietotekniikan laitos\|en=Department of Computing\|
dc.contributor.organization-code	1.2.246.10.2458963.20.85312822902
dc.converis.publication-id	506057677
dc.converis.url	https://research.utu.fi/converis/portal/Publication/506057677
dc.date.accessioned	2026-01-21T13:35:18Z
dc.date.available	2026-01-21T13:35:18Z
dc.description.abstract	Machine translation (MT) systems that support low-resource languages often struggle on specialized domains. While researchers have proposed various techniques for domain adaptation, these approaches typically require model fine-tuning, making them impractical for non-technical users and small organizations. To address this gap, we propose TULUN,(1) a versatile solution for terminology-aware translation, combining neural MT with large language model (LLM)-based post-editing guided by existing glossaries and translation memories. Our open-source web-based platform enables users to easily create, edit, and leverage terminology resources, fostering a collaborative human-machine translation process that respects and incorporates domain expertise while increasing MT accuracy. Evaluations show effectiveness in both real-world and benchmark scenarios: on medical and disaster relief translation tasks for Tetun and Bislama, our system achieves improvements of 16.90-22.41 ChrF++ points over baseline MT systems. Across six low-resource languages on the FLORES dataset, TULUN outperforms both standalone MT and LLM approaches, achieving an average improvement of 2.8 ChrF++ points over NLLB-54B. TULUN is publicly accessible at bislama-trans.rapha.dev.
dc.format.pagerange	139
dc.identifier.isbn	979-8-89176-253-4
dc.identifier.issn	0736-587X
dc.identifier.jour-issn	0736-587X
dc.identifier.olddbid	213129
dc.identifier.oldhandle	10024/196147
dc.identifier.uri	https://www.utupub.fi/handle/11111/54807
dc.identifier.url	https://aclanthology.org/2025.acl-demo.13/
dc.identifier.urn	URN:NBN:fi-fe202601217172
dc.language.iso	en
dc.okm.affiliatedauthor	Suominen, Hanna
dc.okm.discipline	113 Computer and information sciences	en_GB
dc.okm.discipline	113 Tietojenkäsittely ja informaatiotieteet	fi_FI
dc.okm.internationalcopublication	international co-publication
dc.okm.internationality	International publication
dc.okm.type	A4 Conference Article
dc.publisher.country	United States	en_GB
dc.publisher.country	Yhdysvallat (USA)	fi_FI
dc.publisher.country-code	US
dc.relation.conference	Annual Meeting of the Association for Computational Linguistics
dc.relation.doi	10.18653/v1/2025.acl-demo.13
dc.relation.ispartofjournal	Annual Meeting of the Association for Computational Linguistics
dc.relation.volume	63
dc.source.identifier	https://www.utupub.fi/handle/10024/196147
dc.title	TULUN: Transparent and Adaptable Low-resource Machine Translation
dc.title.book	Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics : (Volume 3: System Demonstrations)
dc.year.issued	2025

Tiedostot

Näytetään 1 - 1 / 1

Name:: Merx_etal_2025.pdf
Size:: 771.98 KB
Format:: Adobe Portable Document Format

Lataa

Kokoelmat

Rinnakkaistallenteet