Evaluating firearm examiner testimony using large language models: a comparison of standard and knowledge-enhanced AI systems
| dc.contributor.author | Pompedda | |
| dc.contributor.author | Francesco | |
| dc.contributor.author | Santtila, Pekka | |
| dc.contributor.author | Di Maso, Eleonora | |
| dc.contributor.author | Nyman | |
| dc.contributor.author | Thomas J. | |
| dc.contributor.author | Yongjie, Sun | |
| dc.contributor.author | Zappala, Angelo | |
| dc.contributor.organization | fi=yhteiskuntatieteellinen tiedekunta|en=Faculty of Social Sciences| | |
| dc.contributor.organization-code | 1.2.246.10.2458963.20.81527106298 | |
| dc.converis.publication-id | 492228205 | |
| dc.converis.url | https://research.utu.fi/converis/portal/Publication/492228205 | |
| dc.date.accessioned | 2025-08-28T02:06:55Z | |
| dc.date.available | 2025-08-28T02:06:55Z | |
| dc.description.abstract | <p>This study evaluated the decision-making of Large Language Models (LLMs) in interpreting firearm examiner testimony by comparing a standard LLM to one enhanced with forensic science knowledge. The present study is a replication study. We assessed whether LLMs mirrored human decision patterns and if specialised knowledge led to more critical evaluations of forensic claims. We employed a 2 × 2 × 7 between-subjects design with three independent variables: LLM configuration (standard vs. knowledge-enhanced), cross-examination presence (yes vs. no), and conclusion language (seven variations). Each model condition performed 200 repetitions per scenario. This yielded a total of 5,600 measures of binary verdicts, guilt probability ratings, and credibility assessments. LLMs showed low conviction rates (9.4%) across conditions, with logical variations as a function of the way in which the firearm expert’s conclusion was formulated. Cross-examination produced lower guilt assessments and scientific credibility ratings. Importantly, knowledge-enhanced LLMs demonstrated significantly more conservative evaluations of firearm evidence across all match conditions compared to standard LLMs. LLMs, particularly when enhanced with domain-specific knowledge, showed advantages in evaluating complex scientific evidence compared to human jurors in Garrett et al. (2020), suggesting potential applications for AI systems in supporting legal decision-making.<br></p> | |
| dc.identifier.eissn | 2997-4100 | |
| dc.identifier.olddbid | 208602 | |
| dc.identifier.oldhandle | 10024/191629 | |
| dc.identifier.uri | https://www.utupub.fi/handle/11111/58089 | |
| dc.identifier.url | https://doi.org/10.1080/29974100.2025.2503343 | |
| dc.identifier.urn | URN:NBN:fi-fe2025082788035 | |
| dc.language.iso | en | |
| dc.okm.affiliatedauthor | Pompedda, Francesco | |
| dc.okm.discipline | 515 Psychology | en_GB |
| dc.okm.discipline | 515 Psykologia | fi_FI |
| dc.okm.internationalcopublication | international co-publication | |
| dc.okm.internationality | International publication | |
| dc.okm.type | A1 ScientificArticle | |
| dc.publisher | Taylor & Francis online | |
| dc.relation.articlenumber | 2503343 | |
| dc.relation.doi | 10.1080/29974100.2025.2503343 | |
| dc.relation.ispartofjournal | Journal of Psychology and AI | |
| dc.relation.issue | 1 | |
| dc.relation.volume | 1 | |
| dc.source.identifier | https://www.utupub.fi/handle/10024/191629 | |
| dc.title | Evaluating firearm examiner testimony using large language models: a comparison of standard and knowledge-enhanced AI systems | |
| dc.year.issued | 2025 |
Tiedostot
1 - 1 / 1
Ladataan...
- Name:
- Evaluating firearm examiner testimony using large language models a comparison of standard and knowledge-enhanced AI systems.pdf
- Size:
- 1.3 MB
- Format:
- Adobe Portable Document Format