Explaining Classes through Stable Word Attributions

dc.contributor.authorRönnqvist Samuel
dc.contributor.authorMyntti Amanda
dc.contributor.authorKyröläinen Aki-Juhani
dc.contributor.authorGinter Filip
dc.contributor.authorLaippala Veronika
dc.contributor.organizationfi=data-analytiikka|en=Data-analytiikka|
dc.contributor.organizationfi=kieli- ja käännöstieteiden laitos|en=School of Languages and Translation Studies|
dc.contributor.organization-code1.2.246.10.2458963.20.56461112866
dc.contributor.organization-code1.2.246.10.2458963.20.68940835793
dc.contributor.organization-code2602100
dc.converis.publication-id176874206
dc.converis.urlhttps://research.utu.fi/converis/portal/Publication/176874206
dc.date.accessioned2025-08-28T02:43:05Z
dc.date.available2025-08-28T02:43:05Z
dc.description.abstractInput saliency methods have recently become a popular tool for explaining predictions of deep learning models in NLP. Nevertheless, there has been little work investigating methods for aggregating prediction-level explanations to the class level, nor has a framework for evaluating such class explanations been established. We explore explanations based on XLM-R and the Integrated Gradients input attribution method, and propose 1) the Stable Attribution Class Explanation method (SACX) to extract keyword lists of classes in text classification tasks, and 2) a framework for the systematic evaluation of the keyword lists. We find that explanations of individual predictions are prone to noise, but that stable explanations can be effectively identified through repeated training and explanation. We evaluate on web register data and show that the class explanations are linguistically meaningful and distinguishing of the classes.
dc.format.pagerange1063
dc.format.pagerange1074
dc.identifier.isbn978-1-955917-25-4
dc.identifier.jour-issn0736-587X
dc.identifier.olddbid209575
dc.identifier.oldhandle10024/192602
dc.identifier.urihttps://www.utupub.fi/handle/11111/47850
dc.identifier.urlhttps://aclanthology.org/2022.findings-acl.85
dc.identifier.urnURN:NBN:fi-fe2022112968032
dc.language.isoen
dc.okm.affiliatedauthorRönnqvist, Samuel
dc.okm.affiliatedauthorMyntti, Amanda
dc.okm.affiliatedauthorKyröläinen, Aki
dc.okm.affiliatedauthorGinter, Filip
dc.okm.affiliatedauthorLaippala, Veronika
dc.okm.discipline113 Computer and information sciencesen_GB
dc.okm.discipline6121 Languagesen_GB
dc.okm.discipline113 Tietojenkäsittely ja informaatiotieteetfi_FI
dc.okm.discipline6121 Kielitieteetfi_FI
dc.okm.internationalcopublicationnot an international co-publication
dc.okm.internationalityInternational publication
dc.okm.typeA4 Conference Article
dc.publisher.countryUnited Statesen_GB
dc.publisher.countryYhdysvallat (USA)fi_FI
dc.publisher.country-codeUS
dc.relation.conferenceAnnual Meeting of the Association for Computational Linguistics
dc.relation.doi10.18653/v1/2022.findings-acl.85
dc.relation.ispartofjournalAnnual Meeting of the Association for Computational Linguistics
dc.relation.ispartofseriesAnnual Meeting of the Association for Computational Linguistics
dc.relation.volume60
dc.source.identifierhttps://www.utupub.fi/handle/10024/192602
dc.titleExplaining Classes through Stable Word Attributions
dc.title.bookThe 60th Annual Meeting of the Association for Computational Linguistics: Findings of ACL 2022
dc.year.issued2022

Tiedostot

Näytetään 1 - 1 / 1
Ladataan...
Name:
Explaining classes through Stable WOrld Attributions.pdf
Size:
5.07 MB
Format:
Adobe Portable Document Format