TULUN: Transparent and Adaptable Low-resource Machine Translation
| dc.contributor.author | Merx, Raphael | |
| dc.contributor.author | Suominen, Hanna | |
| dc.contributor.author | Hong, Lois Yinghui | |
| dc.contributor.author | Thieberger, Nick | |
| dc.contributor.author | Cohn, Trevor | |
| dc.contributor.author | Vylomova, Ekaterina | |
| dc.contributor.organization | fi=tietotekniikan laitos|en=Department of Computing| | |
| dc.contributor.organization-code | 1.2.246.10.2458963.20.85312822902 | |
| dc.converis.publication-id | 506057677 | |
| dc.converis.url | https://research.utu.fi/converis/portal/Publication/506057677 | |
| dc.date.accessioned | 2026-01-21T13:35:18Z | |
| dc.date.available | 2026-01-21T13:35:18Z | |
| dc.description.abstract | Machine translation (MT) systems that support low-resource languages often struggle on specialized domains. While researchers have proposed various techniques for domain adaptation, these approaches typically require model fine-tuning, making them impractical for non-technical users and small organizations. To address this gap, we propose TULUN,(1) a versatile solution for terminology-aware translation, combining neural MT with large language model (LLM)-based post-editing guided by existing glossaries and translation memories. Our open-source web-based platform enables users to easily create, edit, and leverage terminology resources, fostering a collaborative human-machine translation process that respects and incorporates domain expertise while increasing MT accuracy. Evaluations show effectiveness in both real-world and benchmark scenarios: on medical and disaster relief translation tasks for Tetun and Bislama, our system achieves improvements of 16.90-22.41 ChrF++ points over baseline MT systems. Across six low-resource languages on the FLORES dataset, TULUN outperforms both standalone MT and LLM approaches, achieving an average improvement of 2.8 ChrF++ points over NLLB-54B. TULUN is publicly accessible at bislama-trans.rapha.dev. | |
| dc.format.pagerange | 129 | |
| dc.format.pagerange | 139 | |
| dc.identifier.isbn | 979-8-89176-253-4 | |
| dc.identifier.issn | 0736-587X | |
| dc.identifier.jour-issn | 0736-587X | |
| dc.identifier.olddbid | 213129 | |
| dc.identifier.oldhandle | 10024/196147 | |
| dc.identifier.uri | https://www.utupub.fi/handle/11111/54807 | |
| dc.identifier.url | https://aclanthology.org/2025.acl-demo.13/ | |
| dc.identifier.urn | URN:NBN:fi-fe202601217172 | |
| dc.language.iso | en | |
| dc.okm.affiliatedauthor | Suominen, Hanna | |
| dc.okm.discipline | 113 Computer and information sciences | en_GB |
| dc.okm.discipline | 113 Tietojenkäsittely ja informaatiotieteet | fi_FI |
| dc.okm.internationalcopublication | international co-publication | |
| dc.okm.internationality | International publication | |
| dc.okm.type | A4 Conference Article | |
| dc.publisher.country | United States | en_GB |
| dc.publisher.country | Yhdysvallat (USA) | fi_FI |
| dc.publisher.country-code | US | |
| dc.relation.conference | Annual Meeting of the Association for Computational Linguistics | |
| dc.relation.ispartofjournal | Annual Meeting of the Association for Computational Linguistics | |
| dc.relation.volume | 63 | |
| dc.source.identifier | https://www.utupub.fi/handle/10024/196147 | |
| dc.title | TULUN: Transparent and Adaptable Low-resource Machine Translation | |
| dc.title.book | Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics : (Volume 3: System Demonstrations) | |
| dc.year.issued | 2025 |
Tiedostot
1 - 1 / 1