Explaining Classes through Stable Word Attributions

Rönnqvist Samuel; Myntti Amanda; Kyröläinen Aki-Juhani; Ginter Filip; Laippala Veronika

Explaining Classes through Stable Word Attributions

dc.contributor.author	Rönnqvist Samuel
dc.contributor.author	Myntti Amanda
dc.contributor.author	Kyröläinen Aki-Juhani
dc.contributor.author	Ginter Filip
dc.contributor.author	Laippala Veronika
dc.contributor.organization	fi=data-analytiikka\|en=Data-analytiikka\|
dc.contributor.organization	fi=kieli- ja käännöstieteiden laitos\|en=School of Languages and Translation Studies\|
dc.contributor.organization-code	2602100
dc.converis.publication-id	176874206
dc.converis.url	https://research.utu.fi/converis/portal/Publication/176874206
dc.date.accessioned	2025-08-28T02:43:05Z
dc.date.available	2025-08-28T02:43:05Z
dc.description.abstract	Input saliency methods have recently become a popular tool for explaining predictions of deep learning models in NLP. Nevertheless, there has been little work investigating methods for aggregating prediction-level explanations to the class level, nor has a framework for evaluating such class explanations been established. We explore explanations based on XLM-R and the Integrated Gradients input attribution method, and propose 1) the Stable Attribution Class Explanation method (SACX) to extract keyword lists of classes in text classification tasks, and 2) a framework for the systematic evaluation of the keyword lists. We find that explanations of individual predictions are prone to noise, but that stable explanations can be effectively identified through repeated training and explanation. We evaluate on web register data and show that the class explanations are linguistically meaningful and distinguishing of the classes.
dc.format.pagerange	1074
dc.identifier.isbn	978-1-955917-25-4
dc.identifier.jour-issn	0736-587X
dc.identifier.olddbid	209575
dc.identifier.oldhandle	10024/192602
dc.identifier.uri	https://www.utupub.fi/handle/11111/47850
dc.identifier.url	https://aclanthology.org/2022.findings-acl.85
dc.identifier.urn	URN:NBN:fi-fe2022112968032
dc.language.iso	en
dc.okm.affiliatedauthor	Rönnqvist, Samuel
dc.okm.affiliatedauthor	Myntti, Amanda
dc.okm.affiliatedauthor	Kyröläinen, Aki
dc.okm.affiliatedauthor	Ginter, Filip
dc.okm.affiliatedauthor	Laippala, Veronika
dc.okm.discipline	113 Computer and information sciences	en_GB
dc.okm.internationalcopublication	not an international co-publication
dc.okm.internationality	International publication
dc.okm.type	A4 Conference Article
dc.publisher.country	United States	en_GB
dc.publisher.country	Yhdysvallat (USA)	fi_FI
dc.publisher.country-code	US
dc.relation.conference	Annual Meeting of the Association for Computational Linguistics
dc.relation.doi	10.18653/v1/2022.findings-acl.85
dc.relation.ispartofjournal	Annual Meeting of the Association for Computational Linguistics
dc.relation.ispartofseries	Annual Meeting of the Association for Computational Linguistics
dc.relation.volume	60
dc.source.identifier	https://www.utupub.fi/handle/10024/192602
dc.title	Explaining Classes through Stable Word Attributions
dc.title.book	The 60th Annual Meeting of the Association for Computational Linguistics: Findings of ACL 2022
dc.year.issued	2022

Tiedostot

Näytetään 1 - 1 / 1

Name:: Explaining classes through Stable WOrld Attributions.pdf
Size:: 5.07 MB
Format:: Adobe Portable Document Format

Lataa

Kokoelmat

Rinnakkaistallenteet