Evaluating firearm examiner testimony using large language models: a comparison of standard and knowledge-enhanced AI systems

Pompedda; Francesco; Santtila, Pekka; Di Maso, Eleonora; Nyman; Thomas J.; Yongjie, Sun; Zappala, Angelo

Evaluating firearm examiner testimony using large language models: a comparison of standard and knowledge-enhanced AI systems

dc.contributor.author	Pompedda
dc.contributor.author	Francesco
dc.contributor.author	Santtila, Pekka
dc.contributor.author	Di Maso, Eleonora
dc.contributor.author	Nyman
dc.contributor.author	Thomas J.
dc.contributor.author	Yongjie, Sun
dc.contributor.author	Zappala, Angelo
dc.contributor.organization	fi=yhteiskuntatieteellinen tiedekunta\|en=Faculty of Social Sciences\|
dc.contributor.organization-code	1.2.246.10.2458963.20.81527106298
dc.converis.publication-id	492228205
dc.converis.url	https://research.utu.fi/converis/portal/Publication/492228205
dc.date.accessioned	2025-08-28T02:06:55Z
dc.date.available	2025-08-28T02:06:55Z
dc.description.abstract	<p>This study evaluated the decision-making of Large Language Models (LLMs) in interpreting firearm examiner testimony by comparing a standard LLM to one enhanced with forensic science knowledge. The present study is a replication study. We assessed whether LLMs mirrored human decision patterns and if specialised knowledge led to more critical evaluations of forensic claims. We employed a 2 × 2 × 7 between-subjects design with three independent variables: LLM configuration (standard vs. knowledge-enhanced), cross-examination presence (yes vs. no), and conclusion language (seven variations). Each model condition performed 200 repetitions per scenario. This yielded a total of 5,600 measures of binary verdicts, guilt probability ratings, and credibility assessments. LLMs showed low conviction rates (9.4%) across conditions, with logical variations as a function of the way in which the firearm expert’s conclusion was formulated. Cross-examination produced lower guilt assessments and scientific credibility ratings. Importantly, knowledge-enhanced LLMs demonstrated significantly more conservative evaluations of firearm evidence across all match conditions compared to standard LLMs. LLMs, particularly when enhanced with domain-specific knowledge, showed advantages in evaluating complex scientific evidence compared to human jurors in Garrett et al. (2020), suggesting potential applications for AI systems in supporting legal decision-making.<br></p>
dc.identifier.eissn	2997-4100
dc.identifier.olddbid	208602
dc.identifier.oldhandle	10024/191629
dc.identifier.uri	https://www.utupub.fi/handle/11111/58089
dc.identifier.url	https://doi.org/10.1080/29974100.2025.2503343
dc.identifier.urn	URN:NBN:fi-fe2025082788035
dc.language.iso	en
dc.okm.affiliatedauthor	Pompedda, Francesco
dc.okm.discipline	515 Psychology	en_GB
dc.okm.discipline	515 Psykologia	fi_FI
dc.okm.internationalcopublication	international co-publication
dc.okm.internationality	International publication
dc.okm.type	A1 ScientificArticle
dc.publisher	Taylor & Francis online
dc.publisher.country	United States	en_GB
dc.publisher.country-code	US
dc.relation.articlenumber	2503343
dc.relation.doi	10.1080/29974100.2025.2503343
dc.relation.ispartofjournal	Journal of Psychology and AI
dc.relation.issue	1
dc.relation.volume	1
dc.source.identifier	https://www.utupub.fi/handle/10024/191629
dc.title	Evaluating firearm examiner testimony using large language models: a comparison of standard and knowledge-enhanced AI systems
dc.year.issued	2025

Tiedostot

Näytetään 1 - 1 / 1

Name:: Evaluating firearm examiner testimony using large language models a comparison of standard and knowledge-enhanced AI systems.pdf
Size:: 1.3 MB
Format:: Adobe Portable Document Format

Lataa

Kokoelmat

Rinnakkaistallenteet