Evaluating firearm examiner testimony using large language models: a comparison of standard and knowledge-enhanced AI systems

dc.contributor.authorPompedda
dc.contributor.authorFrancesco
dc.contributor.authorSanttila, Pekka
dc.contributor.authorDi Maso, Eleonora
dc.contributor.authorNyman
dc.contributor.authorThomas J.
dc.contributor.authorYongjie, Sun
dc.contributor.authorZappala, Angelo
dc.contributor.organizationfi=yhteiskuntatieteellinen tiedekunta|en=Faculty of Social Sciences|
dc.contributor.organization-code1.2.246.10.2458963.20.81527106298
dc.converis.publication-id492228205
dc.converis.urlhttps://research.utu.fi/converis/portal/Publication/492228205
dc.date.accessioned2025-08-28T02:06:55Z
dc.date.available2025-08-28T02:06:55Z
dc.description.abstract<p>This study evaluated the decision-making of Large Language Models (LLMs) in interpreting firearm examiner testimony by comparing a standard LLM to one enhanced with forensic science knowledge. The present study is a replication study. We assessed whether LLMs mirrored human decision patterns and if specialised knowledge led to more critical evaluations of forensic claims. We employed a 2 × 2 × 7 between-subjects design with three independent variables: LLM configuration (standard vs. knowledge-enhanced), cross-examination presence (yes vs. no), and conclusion language (seven variations). Each model condition performed 200 repetitions per scenario. This yielded a total of 5,600 measures of binary verdicts, guilt probability ratings, and credibility assessments. LLMs showed low conviction rates (9.4%) across conditions, with logical variations as a function of the way in which the firearm expert’s conclusion was formulated. Cross-examination produced lower guilt assessments and scientific credibility ratings. Importantly, knowledge-enhanced LLMs demonstrated significantly more conservative evaluations of firearm evidence across all match conditions compared to standard LLMs. LLMs, particularly when enhanced with domain-specific knowledge, showed advantages in evaluating complex scientific evidence compared to human jurors in Garrett et al. (2020), suggesting potential applications for AI systems in supporting legal decision-making.<br></p>
dc.identifier.eissn2997-4100
dc.identifier.olddbid208602
dc.identifier.oldhandle10024/191629
dc.identifier.urihttps://www.utupub.fi/handle/11111/58089
dc.identifier.urlhttps://doi.org/10.1080/29974100.2025.2503343
dc.identifier.urnURN:NBN:fi-fe2025082788035
dc.language.isoen
dc.okm.affiliatedauthorPompedda, Francesco
dc.okm.discipline515 Psychologyen_GB
dc.okm.discipline515 Psykologiafi_FI
dc.okm.internationalcopublicationinternational co-publication
dc.okm.internationalityInternational publication
dc.okm.typeA1 ScientificArticle
dc.publisherTaylor & Francis online
dc.relation.articlenumber2503343
dc.relation.doi10.1080/29974100.2025.2503343
dc.relation.ispartofjournalJournal of Psychology and AI
dc.relation.issue1
dc.relation.volume1
dc.source.identifierhttps://www.utupub.fi/handle/10024/191629
dc.titleEvaluating firearm examiner testimony using large language models: a comparison of standard and knowledge-enhanced AI systems
dc.year.issued2025

Tiedostot

Näytetään 1 - 1 / 1
Ladataan...
Name:
Evaluating firearm examiner testimony using large language models a comparison of standard and knowledge-enhanced AI systems.pdf
Size:
1.3 MB
Format:
Adobe Portable Document Format